Colab Practice: Problem Solving with SQL

SQL Fundamentals Practice Problems

Step 0: Play with PostgreSQL in Colab

Before you dive into the specific practice problems below, we recommend setting up your own interactive environment.

Use this notebook to:

Set up PostgreSQL: Run the shell commands to install and start a real Postgres instance in the cloud.
Explore the Schema: Visualize the Users, Songs, and Listens tables.
Run Scratchpad Queries: Test your intuition by reading+writing ad-hoc SQL before you attempt the formal exercises.

Practice 1: Writing Debug Tables

Drawing a Debug Table is the single best way to catch logic errors before they hit production. It forces you to think like the database: FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT.

Tip: Check your work: All answers and interactive logic traces are available in the Practice Solutions Colab.

1.1: Filtering Strategies (JOIN vs IN vs EXISTS)

Trace these four queries. Do they all return the same set of genres? Which one is the most efficient for checking existence in a large table?

-- Query 1: INNER JOIN
SELECT DISTINCT s.genre FROM Songs s
INNER JOIN Listens l ON s.song_id = l.song_id;

-- Query 2: INNER JOIN (No Distinct - what happens?)
SELECT s.genre FROM Songs s
INNER JOIN Listens l ON s.song_id = l.song_id;

-- Query 3: IN Clause
SELECT DISTINCT s.genre FROM Songs s
WHERE s.song_id IN (SELECT song_id FROM Listens);

-- Query 4: EXISTS Clause
SELECT DISTINCT s.genre FROM Songs s
WHERE EXISTS (SELECT 1 FROM Listens l WHERE l.song_id = s.song_id);

1.2: Set Operations (UNION vs. EXCEPT)

Mickey (User 1) has both history (Listens) and recommendations. Trace these queries to see how his listening habits overlap with our AI suggestions.

-- Query 5: Mickey's Listening History
SELECT s.genre FROM Songs s INNER JOIN Listens l ON s.song_id = l.song_id WHERE l.user_id = 1;

-- Query 6: Mickey's Recommendations
SELECT s.genre FROM Songs s INNER JOIN Recommendations r ON s.song_id = r.song_id WHERE r.user_id = 1;

-- Query 7: UNION (History OR Recommendations)
SELECT s.genre FROM Songs s INNER JOIN Listens l ON s.song_id = l.song_id WHERE l.user_id = 1
UNION
SELECT s.genre FROM Songs s INNER JOIN Recommendations r ON s.song_id = r.song_id WHERE r.user_id = 1;

-- Query 8: EXCEPT (History BUT NOT recommended)
SELECT s.genre FROM Songs s INNER JOIN Listens l ON s.song_id = l.song_id WHERE l.user_id = 1
EXCEPT
SELECT s.genre FROM Songs s INNER JOIN Recommendations r ON s.song_id = r.song_id WHERE r.user_id = 1;

1.3: The LEFT JOIN

Trace this query for User 1. Does it return more or fewer rows than a regular INNER JOIN? Why?

-- Query 9: All genres, matched against Mickey's listens
SELECT DISTINCT s.genre 
FROM Songs s 
LEFT JOIN Listens l ON s.song_id = l.song_id AND l.user_id = 1;

The Practice Dataset

For the following advanced challenges, ensure your manual traces account for these new records.

Added Users

user_id	name	email
11	Bob	bob@example.com
12	Alice	alice@example.com
13	Eve	eve@example.com
14	Charlie	charlie@example.com
15	David	david@example.com

Added Songs

song_id	title	artist	genre
11	Shake it off	Taylor Swift	Pop
12	Karma	Taylor Swift	Pop
13	Anti-hero	Taylor Swift	Rock
14	Enchanted	Taylor Swift	Pop
15	Bad blood	Taylor Swift	Rock
16	Let it be	Beatles	Classic
17	Here comes the sun	Beatles	Classic
18	Twist and shout	Beatles	Classic

Practice 2: Building Intuition with Window Functions

Goal: Let's play with a few queries that "feel" related, so we gain some intuition on how they are different and produce different result sets.

2.1: The Base Count (`qry_count_songs`)

Find the total number of listens for each song, grouped by genre, song ID, and title. This is your baseline for all ranking queries.

2.2: Ranking by Genre (`qry_top_songs_genre`)

Find the top-k ranked songs per genre based on the number of listens (try k = 1, 2, and 3).

How does the "competition" change when you rank within a specific genre vs. the whole dataset?

2.3: Ranking by Artist (`qry_top_songs_artist`)

Find the top-k ranked songs per artist based on the number of listens.

If an artist has multiple songs, which one stays on top?

2.4: Finding the Global Peak (`qry_max_listens`)

Find the single song with the absolute maximum number of listens across the entire platform.

2.5: The Artist Champion (`qry_max_listens_artist`)

Find the song with the maximum number of listens for each artist.

This is essentially a "Top-1 per Artist" query. Compare this result to 2.3.

2.6: Multi-Dimensional Ranking (`qry_top_songs_genre_artist`)

Find the top-k ranked songs per (genre, artist) based on the number of listens.

Note how the partitions become much smaller. What happens to the rank values?

Challenge: Identical Results?

Run these two queries in the Colab. They produce the exact same result set.

Why does Rank=1 per Genre produce the same data as Rank=1 per (Genre, Artist) in our current dataset?
Are these queries equivalent?

-- Query A: Rank 1 per Genre
runSql('Query top_songs_genre', qry_top_songs_genre.format(k=1))

-- Query B: Rank 1 per (Genre, Artist)
runSql('Query top_songs_genre_artist', qry_top_songs_genre_artist.format(k=1))

What data condition is required to make "Top Genre Song" different from "Top Song for that Artist in that Genre"?

Ready to see the results?

All queries, answers, and extended datasets are ready for you in the official Colab.

Open Practice Solutions Colab