Filter operations should be your best friends.

Filter early and filter often.



These operations can occur in parallel.

Especially effective if you can filter on a column that is used to partition your dataset.