SELECT-FROM-WHERE: The SQL Foundation
Concept. SELECT lists the columns you want back; FROM names the table; WHERE keeps only the rows whose predicate is true.
Intuition. Start with the Listens table (FROM), drop the rows whose predicate is false (WHERE), keep only the columns you named (SELECT). The clauses execute in that order: FROM → WHERE → SELECT.
Example. Use WHERE to trim the table to just the rows you need (e.g. 9 rows down to 5). Then SELECT the columns you care about (e.g. from all columns to just 3).
Basic Query Structure: FloW...S (like water)
-- Get listens with high ratings (above 4.0)
SELECT user_id, song_id, rating -- Step last: Choose only 3 columns for output
FROM Listens -- Step 1: Picks table with 9 rows
WHERE rating > 4.0 -- Step 2: Filters 9 rows; keeps subset of rows
-
Execution Order: FROM → WHERE → SELECT (FWS order, not SFW. FloWS like water)
-
Row Filtering: WHERE evaluates each row individually
-
Column Selection: SELECT determines output columns
Debug Table (row by row)
| listen_id | user_id | song_id | rating | Action | Show in Result | |
|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 4.5 | ✓ TRUE, Keep | ✓ | |
| 2 | 1 | 2 | 4.2 | ✓ TRUE, Keep | ✓ | |
| 3 | 1 | 6 | 3.9 | ✗ FALSE, Remove | ✗ | |
| 4 | 2 | 2 | 4.7 | ✓ TRUE, Keep | ✓ | |
| 5 | 2 | 7 | 4.6 | ✓ TRUE, Keep | ✓ | |
| 6 | 2 | 8 | 3.9 | ✗ FALSE, Remove | ✗ | |
| 7 | 3 | 1 | 2.9 | ✗ FALSE, Remove | ✗ | |
| 8 | 3 | 2 | 4.9 | ✓ TRUE, Keep | ✓ | |
| 9 | 3 | 6 | NULL | ? UNKNOWN, Remove | ✗ |
Output
| user_id | song_id | rating | |
|---|---|---|---|
| 1 | 1 | 4.5 | |
| 1 | 2 | 4.2 | |
| 2 | 2 | 4.7 | |
| 2 | 7 | 4.6 | |
| 3 | 2 | 4.9 |
Why SQL FloWS Like This?
SQL is declarative. But to start, think of it like Python:
def high_ratings(Listens):
output = []
for row in Listens: # FROM Listens
if row.rating > 4.0: # WHERE rating > 4.0
output += [row.user_id, # SELECT user_id,
row.song_id, # song_id,
row.rating] # rating
return output
Declarative (SQL): "Get me listens with rating > 4.0" → Database figures out where/how
Procedural (Python): "Here's my list data struct from this file on this machine. Loop through each row, check condition, build output. Parallelize with these step1, step2, step3." → You spell out each step
Key insight: SELECT is like the final mapping - it comes last in execution but defines what you return!
SELECT * vs Specific Columns
-- Get all columns for user 1's listening history
SELECT *
FROM Listens
WHERE user_id = 1
| listen_id | user_id | song_id | rating | listen_time | |
|---|---|---|---|---|---|
| 1 | 1 | 1 | 4.5 | 2024-08-30 14:35:00 | |
| 2 | 1 | 2 | 4.2 | NULL | |
| 3 | 1 | 6 | 3.9 | 2024-08-29 10:15:00 |
-- Get only user_id and rating for user 1
SELECT user_id, rating
FROM Listens
WHERE user_id = 1
| user_id | rating | |
|---|---|---|
| 1 | 4.5 | |
| 1 | 4.2 | |
| 1 | 3.9 |
WHERE with Multiple Conditions
-- Get user 1's high-rated songs (rating > 4.0)
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 AND rating > 4.0
| listen_id | user_id | rating | AND Result | Action | |
|---|---|---|---|---|---|
| 1 | 1 | 4.5 | ✓ TRUE | Keep | |
| 2 | 1 | 4.2 | ✓ TRUE | Keep | |
| 3 | 1 | 3.9 | ✗ FALSE | Remove | |
| 4 | 2 | 4.7 | ✗ FALSE | Remove | |
| 5-9 | ... | ... | ✗ FALSE | Remove |
-- Get all user 1's listens OR any listen rated > 4.5
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 OR rating > 4.5
| listen_id | user_id | rating | Cond 1: user_id=1 | Cond 2: rating>4.5 | OR Result | Action | |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 4.5 | ✓ TRUE | ✗ FALSE | ✓ TRUE | Keep | |
| 2 | 1 | 4.2 | ✓ TRUE | ✗ FALSE | ✓ TRUE | Keep | |
| 3 | 1 | 3.9 | ✓ TRUE | ✗ FALSE | ✓ TRUE | Keep | |
| 4 | 2 | 4.7 | ✗ FALSE | ✓ TRUE | ✓ TRUE | Keep | |
| 5 | 2 | 4.6 | ✗ FALSE | ✓ TRUE | ✓ TRUE | Keep | |
| 6 | 2 | 3.9 | ✗ FALSE | ✗ FALSE | ✗ FALSE | Remove | |
| 7 | 3 | 2.9 | ✗ FALSE | ✗ FALSE | ✗ FALSE | Remove | |
| 8 | 3 | 4.9 | ✗ FALSE | ✓ TRUE | ✓ TRUE | Keep | |
| 9 | 3 | NULL | ✗ FALSE | ? UNKNOWN | ? UNKNOWN | Remove |
SELECT with Calculations
-- Calculate double rating and categorize by quality
SELECT
user_id,
rating,
rating * 2 AS double_rating, -- AS means renaming columns
CASE -- CASE/WHEN: Creates conditional logic - like if-then-else in SQL
WHEN rating >= 4.5 THEN 'Excellent'
WHEN rating >= 4.0 THEN 'Good'
ELSE 'Average'
END AS category
FROM Listens
WHERE rating IS NOT NULL
| user_id | rating | double_rating | Category | Annotations | |
|---|---|---|---|---|---|
| 1 | 4.5 | 9.0 | Excellent | 4.5 >= 4.5? Yes | |
| 1 | 4.2 | 8.4 | Good | 4.2 >= 4.5? No 4.2 >= 4.0? Yes | |
| 1 | 3.9 | 7.8 | Average | 3.9 >= 4.5? No 3.9 >= 4.0? No | |
| 2 | 4.7 | 9.4 | Excellent | 4.7 >= 4.5? Yes | |
| 2 | 4.6 | 9.2 | Excellent | 4.6 >= 4.5? Yes | |
| 2 | 3.9 | 7.8 | Average | 3.9 >= 4.5? No 3.9 >= 4.0? No | |
| 3 | 2.9 | 5.8 | Average | 2.9 >= 4.5? No 2.9 >= 4.0? No | |
| 3 | 4.9 | 9.8 | Excellent | 4.9 >= 4.5? Yes |
Common Patterns
Pattern 1: Filter and Transform
-- Round ratings between 4.0 and 5.0
SELECT
user_id,
ROUND(rating, 0) AS rounded_rating
FROM Listens
WHERE rating BETWEEN 4.0 AND 5.0
| Step | Operation | Effect |
|---|---|---|
| 1 | FROM Listens | Load 9 rows |
| 2 | WHERE BETWEEN 4.0 AND 5.0 | Filter to 5 rows |
| 3 | SELECT with ROUND | Transform rating values |
Understanding Aliases (AS Keyword)
What is an Alias?
An alias creates a new name for a column in your output using the AS keyword:
-- FAILS: Alias 'double_rating' doesn't exist in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE double_rating > 8 -- ERROR!
-- WORKS: Use original expression in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating * 2 > 8
Why It Matters:
| Clause | Execution Order | Can Reference | Cannot Reference |
|---|---|---|---|
| FROM | 1st | Table names | SELECT aliases |
| WHERE | 2nd | Table columns | SELECT aliases |
| SELECT | 3rd | Table columns, expressions | Future aliases |