Execution plan is available through dataframe and web ui.

Execution plan in web_ui.

Navigate to SQL and select `count`

Notice that hash aggregations. Here, each executor counts their results, these results are sent to the driver who tabulates all the results.

df.explain()

Explain

Explaination of your dataframes lineage.

Spark is similar to SQL in that using only the necessary columns and filtering early will improve query speed.

Execution plan is available through dataframe and web ui.

Execution plan in web_ui.

df.explain()

Beware: Execution plans can quickly become too long for the driver to keep in memory!

Prevent this by caching, checkpointing, and writing to disk when necessary.