Navigate to SQL and select `count`
Notice that hash aggregations. Here, each executor counts their results, these results are sent to the driver who tabulates all the results.
Explaination of your dataframes lineage.
Spark is similar to SQL in that using only the necessary columns and filtering early will improve query speed.