SQL Tutorial

How I learned to stop worring and love the join

Created by Dan Vatterott / @dvatterott

S-Q-L or Sequel

  • “Structured English Query Language” (SEQUEL).
  • Replaced QUEL database language, and “sequel” was a pun.
  • Changed to “Structured Query Language” (SQL) due to trademark issues.

Read more here

SQL is hard

Syntax


							SELECT
							    *
							FROM
							    my_table
							LIMIT 5
						

listen_table

Customer_id listen_date artist artist_id song_id
1 "2017-11-01" "A Tribe Called Quest" 5.0 1.0
2 "2017-11-01" "A Tribe Call Quest" 5.0 7.0
2 "2017-11-02" "Wu-Tang Clan'" 1.0 7.0

Where statement

Create a list of people who listened to "A Tribe Called Quest".


								SELECT
								    DISTINCT user_id
								FROM
								    listen_table
								WHERE
								    artist = "A Tribe Called Quest"
							

Where statement: Reloaded

Create a list of people who listened to bands whose names start with "The".


							 SELECT
							     DISTINCT user_id
							 FROM
							     listen_table
							 WHERE
							     artist LIKE "The%"
						 

Note: sql alchemy (python api) gets angry about % signs (use %%)

Group By Statement

Create a list of the 10 people that listened to the most songs by "Lauryn Hill".


							 SELECT
							     user_id,
							     COUNT(song_id) AS song_count
							 FROM
							     listen_table
							 WHERE
							     artist = "Lauryn Hill"
							 GROUP BY
							     user_id
							 ORDER BY
							     song_count DESC
							 LIMIT 10
						 

Group By Statement: The Return

Create a list of the people that listened to more than 10 songs by "Jurassic 5".


							SELECT
							    user_id,
							    COUNT(song_id) AS song_count
							FROM
							    listen_table
							WHERE
							    artist = "Jurassic 5"
							GROUP BY
							    user_id
							HAVING
							    COUNT(song_id) > 10
						

Note: Can't use select alias in where/having :/

The Join

Tables are not circles

In joins the important question is what happens to missing/repeated values

artist_table

artist_id artist male_lead music_genre
5 "A Tribe Called Quest" 1 "Hip Hop"
1 "Wu-Tang Clan" 1 "Hip Hop"
2 "The Fugees" 0 "R and B"

Join Question

Create a list of the 10 people who listened to the most "Hip Hop".


						SELECT
						    lt.user_id,
						    COUNT(DISTINCT lt.song_id ) AS listen_count
						FROM
						    listen_table AS lt
						    INNER JOIN artist_table AS at
						      ON lt.artist_id = at.artist_id
						WHERE
						    at.music_genre = "Hip Hop"
						GROUP BY
						    lt.user_id
						ORDER BY
						    listen_count
						LIMIT 10
					

Sub-Queries

Create a list of the 10 people who listened to the most songs by unique "R and B" artists.


						SELECT
						    lt.user_id,
						    COUNT(lt.artists) AS artists_count
						FROM
						    (SELECT
						         DISTINCT user_id, artist
						     FROM
						         listen_table
						    ) AS lt
						    INNER JOIN artist_table AS at
						      ON lt.artist_id = at.artist_id
						WHERE
						    at.music_genre = "R and B"
						GROUP BY
						    lt.user_id
						ORDER BY
						    artist_count
						LIMIT 10
					

SQL makes my head hurt

Sub-Queries with a temp table

Create a list of the 10 people who listened to the most songs by unique "R and B" artists.


						WITH temp_table AS (
						    SELECT
						        DISTINCT user_id, artist
						    FROM
						        listen_table
						)
						SELECT
						    at.user_id,
						    COUNT(at.artist) AS artist_count
						FROM
						    temp_table AS tt
						    INNER JOIN artist_table AS at
						      ON tt.artist_id = at.artist_id
						WHERE
						    at.music_genre = "R and B"
						GROUP BY
						    at.user_id
						ORDER BY
						    artist_count
						LIMIT 10
					

Much better!

In Statements

Create a list of people who listened to "The Beatles" and "Kanye West".


						WITH temp_table AS (
						    SELECT
						        DISTINCT user_id  AS user_id
						    FROM
						        listen_table
						    WHERE
						        artist = 'The Beatles'
						)
						SELECT
						    DISTINCT user_id AS user_id
						FROM
						    listen_table
						WHERE
						    artist = 'Kanye West'
						AND
						    user_id IN (SELECT user_id FROM temp_table)
					

Exist Statements

Create a list of people who listened to "The Beatles" and "Kanye West".


						WITH temp_table AS (
						    SELECT
						        DISTINCT user_id AS user_id
						    FROM
						        listen_table
						    WHERE
						        artist = 'The Beatles'
						)
						SELECT
						    DISTINCT user_id AS user_id
						FROM
						    listen_table AS lt
						WHERE
						    artist = 'Kanye West'
						    AND
						    EXISTS (
							SELECT
					              1
						        FROM
						            temp_table
						        WHERE
						            temp_table.user_id = lt.user_id)
					

Case Statements

Count whether more songs are listened to on the weekends or weekdays.


					SELECT
          CASE
              WHEN extract('dow' from listen_date) > 5 THEN 'weekend'
              ELSE 'weekday'
          END AS weekend_or_not,
          COUNT(*)
					FROM
					    listen_table AS lt
					GROUP BY
					    weekend_or_not
				

Window Functions

Create a list of users and the first artist they listened to.


					WITH temp_table AS (
	        SELECT
					        user_id,
					        artist,
					        rank() OVER (PARTITION BY user_id ORDER BY listen_date ASC)
					    FROM
					        listen_table
					)
					SELECT
					    user_id,
					    artist
					FROM
					    temp_table
					WHERE
					    rank = 1
				

Optimization/Run Order

  1. FROM
  2. ON
  3. OUTER
  4. WHERE
  5. GROUP BY
  6. HAVING
  7. SELECT
  8. DISTINCT
  9. ORDER BY
  10. LIMIT/TOP

Temp tables are faster when many tables. Joins are faster when using fewer tables.

Watch out for null values

The way that SQL handles null values is often not how you would expect.

Resources

SQL can be fun!