SparkSession Docs
This is the container that will read in dataframes, register tables, execute commands on tables, cache tables, etc.
I think of it as my spark kernel (like you have a python kernel when you enter 'python' into the terminal)
Monitoring Docs
This is the first place I look when optimizing a job.
We can see jobs, stages (occur within jobs), cached tables, individual executors, and lineage of dataframes.
DataFrameReader Docs
Instructions for how Spark should read the data.
Remember that all your executors will receive a (random?) portion of the data.