Colab Practice: Problem Solving with SQL
Step 0: Play with PostgreSQL in Colab
Before diving into the specifics, set up your interactive environment.
Open PostgreSQL Colab Playground
Here's what you need to do:
-
Set up PostgreSQL: Install and initiate a Postgres instance in the cloud.
-
Explore the Schema: Familiarize yourself with the
Users,Songs, andListenstables. -
Run Scratchpad Queries: Test your SQL skills with some ad-hoc queries before tackling the exercises.
Practice 1: Writing Debug Tables
Creating a Debug Table is your go-to strategy for preemptively catching logic errors. It forces you to think like the database: FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT.
Tip: Check your work: All answers and interactive logic traces are available in the Practice Solutions Colab.
1.1: Filtering Strategies (JOIN vs IN vs EXISTS)
Analyze these four queries. Do they all pull the same genres? Which is the most efficient for checking existence in a large dataset?
-- Query 1: INNER JOIN
SELECT DISTINCT s.genre FROM Songs s
INNER JOIN Listens l ON s.song_id = l.song_id;
-- Query 2: INNER JOIN (No Distinct - what happens?)
SELECT s.genre FROM Songs s
INNER JOIN Listens l ON s.song_id = l.song_id;
-- Query 3: IN Clause
SELECT DISTINCT s.genre FROM Songs s
WHERE s.song_id IN (SELECT song_id FROM Listens);
-- Query 4: EXISTS Clause
SELECT DISTINCT s.genre FROM Songs s
WHERE EXISTS (SELECT 1 FROM Listens l WHERE l.song_id = s.song_id);
1.2: Set Operations (UNION vs. EXCEPT)
Mickey (User 1) has both listening history and recommendations. Examine these queries to see how his habits align with our AI's suggestions.
-- Query 5: Mickey's Listening History
SELECT s.genre FROM Songs s INNER JOIN Listens l ON s.song_id = l.song_id WHERE l.user_id = 1;
-- Query 6: Mickey's Recommendations
SELECT s.genre FROM Songs s INNER JOIN Recommendations r ON s.song_id = r.song_id WHERE r.user_id = 1;
-- Query 7: UNION (History OR Recommendations)
SELECT s.genre FROM Songs s INNER JOIN Listens l ON s.song_id = l.song_id WHERE l.user_id = 1
UNION
SELECT s.genre FROM Songs s INNER JOIN Recommendations r ON s.song_id = r.song_id WHERE r.user_id = 1;
-- Query 8: EXCEPT (History BUT NOT recommended)
SELECT s.genre FROM Songs s INNER JOIN Listens l ON s.song_id = l.song_id WHERE l.user_id = 1
EXCEPT
SELECT s.genre FROM Songs s INNER JOIN Recommendations r ON s.song_id = r.song_id WHERE r.user_id = 1;
1.3: The LEFT JOIN
Check this query for User 1. Does it yield more or fewer rows than an INNER JOIN? Why?
-- Query 9: All genres, matched against Mickey's listens
SELECT DISTINCT s.genre
FROM Songs s
LEFT JOIN Listens l ON s.song_id = l.song_id AND l.user_id = 1;
The Practice Dataset
For the advanced challenges, ensure your manual traces include these new records.
Added Users
| user_id | name | |
|---|---|---|
| 11 | Bob | bob@example.com |
| 12 | Alice | alice@example.com |
| 13 | Eve | eve@example.com |
| 14 | Charlie | charlie@example.com |
| 15 | David | david@example.com |
Added Songs
| song_id | title | artist | genre |
|---|---|---|---|
| 11 | Shake it off | Taylor Swift | Pop |
| 12 | Karma | Taylor Swift | Pop |
| 13 | Anti-hero | Taylor Swift | Rock |
| 14 | Enchanted | Taylor Swift | Pop |
| 15 | Bad blood | Taylor Swift | Rock |
| 16 | Let it be | Beatles | Classic |
| 17 | Here comes the sun | Beatles | Classic |
| 18 | Twist and shout | Beatles | Classic |
Practice 2: Building Intuition with Window Functions
Goal: Experiment with queries that seem similar to understand their differences and the results they produce.
2.1: The Base Count (qry_count_songs)
Find the total listens for each song, grouped by genre, song ID, and title. This is your baseline for ranking queries.
2.2: Ranking by Genre (qry_top_songs_genre)
Identify the top-k ranked songs per genre based on listens (try k = 1, 2, and 3).
- How does competition shift when ranking within a genre versus the entire dataset?
2.3: Ranking by Artist (qry_top_songs_artist)
Identify the top-k ranked songs per artist based on listens.
- If an artist has multiple songs, which one leads?
2.4: Finding the Global Peak (qry_max_listens)
Find the song with the highest number of listens across the platform.
2.5: The Artist Champion (qry_max_listens_artist)
Identify the song with the most listens for each artist.
- This is essentially a "Top-1 per Artist" query. Compare this to 2.3.
2.6: Multi-Dimensional Ranking (qry_top_songs_genre_artist)
Find the top-k ranked songs per (genre, artist) based on listens.
- Observe how the partitions shrink. What happens to the rank values?
Challenge: Identical Results?
Run these two queries in the Colab. They yield the same results.
-
Why does Rank=1 per Genre match Rank=1 per (Genre, Artist) in this dataset?
-
Are these queries truly equivalent?
-- Query A: Rank 1 per Genre
runSql('Query top_songs_genre', qry_top_songs_genre.format(k=1))
-- Query B: Rank 1 per (Genre, Artist)
runSql('Query top_songs_genre_artist', qry_top_songs_genre_artist.format(k=1))
- What data condition would differentiate "Top Genre Song" from "Top Song for that Artist in that Genre"?
Ready to see the results?
All queries, answers, and extended datasets are ready for you in the official Colab.
Open Practice Solutions Colab