This is the container that will read in dataframes, register tables, execute commands on tables, cache tables, etc.
I think of it as my spark kernel (like you have a python kernel when you enter 'python' into the terminal)
This is the first place I look when optimizing a job.
We can see jobs, stages (occur within jobs), cached tables, individual executors, and lineage of dataframes.
Instructions for how Spark should read the data.
Remember that all your executors will receive a (random?) portion of the data.