SELECT-FROM-WHERE: The SQL Foundation
Core Concept
- Example: Use WHERE to filter the table down to only the rows you want (e.g. 9 rows → 5 rows). Then use SELECT to pick only the columns you need (e.g. from all columns → just 3 columns).
Key Rules
-
Execution Order: FROM → WHERE → SELECT (FWS order, not SFW. FloWS like water)
-
Row Filtering: WHERE evaluates each row individually
-
Column Selection: SELECT determines output columns
Basic Query Structure: FloW...S (like water)
-- Get listens with high ratings (above 4.0)
SELECT user_id, song_id, rating -- Step last: Choose only 3 columns for output
FROM Listens -- Step 1: Picks table with 9 rows
WHERE rating > 4.0 -- Step 2: Filters 9 rows; keeps subset of rows
Debug Table (row by row)
listen_id | user_id | song_id | rating | Action | Show in Result | |
---|---|---|---|---|---|---|
1 | 1 | 1 | 4.5 | ✓ TRUE, Keep | ✓ | |
2 | 1 | 2 | 4.2 | ✓ TRUE, Keep | ✓ | |
3 | 1 | 6 | 3.9 | ✗ FALSE, Remove | ✗ | |
4 | 2 | 2 | 4.7 | ✓ TRUE, Keep | ✓ | |
5 | 2 | 7 | 4.6 | ✓ TRUE, Keep | ✓ | |
6 | 2 | 8 | 3.9 | ✗ FALSE, Remove | ✗ | |
7 | 3 | 1 | 2.9 | ✗ FALSE, Remove | ✗ | |
8 | 3 | 2 | 4.9 | ✓ TRUE, Keep | ✓ | |
9 | 3 | 6 | NULL | ? UNKNOWN, Remove | ✗ |
Output
user_id | song_id | rating | |
---|---|---|---|
1 | 1 | 4.5 | |
1 | 2 | 4.2 | |
2 | 2 | 4.7 | |
2 | 7 | 4.6 | |
3 | 2 | 4.9 |
Why SQL FloWS Like This?
SQL is declarative. But to start, think of it like Python:
def high_ratings(Listens): output = [] for row in Listens: # FROM Listens if row.rating > 4.0: # WHERE rating > 4.0 output += [row.user_id, # SELECT user_id, row.song_id, # song_id, row.rating] # rating return output
Declarative (SQL): "Get me listens with rating > 4.0" → Database figures out where/how
Procedural (Python): "Here's my list data struct from this file on this machine. Loop through each row, check condition, build output. Parallelize with these step1, step2, step3." → You spell out each step
SELECT * vs Specific Columns
-- Get all columns for user 1's listening history
SELECT *
FROM Listens
WHERE user_id = 1
listen_id | user_id | song_id | rating | listen_time | |
---|---|---|---|---|---|
1 | 1 | 1 | 4.5 | 2024-08-30 14:35:00 | |
2 | 1 | 2 | 4.2 | NULL | |
3 | 1 | 6 | 3.9 | 2024-08-29 10:15:00 |
-- Get only user_id and rating for user 1
SELECT user_id, rating
FROM Listens
WHERE user_id = 1
user_id | rating | |
---|---|---|
1 | 4.5 | |
1 | 4.2 | |
1 | 3.9 |
WHERE with Multiple Conditions
-- Get user 1's high-rated songs (rating > 4.0)
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 AND rating > 4.0
listen_id | user_id | rating | AND Result | Action | |
---|---|---|---|---|---|
1 | 1 | 4.5 | ✓ TRUE | Keep | |
2 | 1 | 4.2 | ✓ TRUE | Keep | |
3 | 1 | 3.9 | ✗ FALSE | Remove | |
4 | 2 | 4.7 | ✗ FALSE | Remove | |
5-9 | ... | ... | ✗ FALSE | Remove |
-- Get all user 1's listens OR any listen rated > 4.5
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 OR rating > 4.5
listen_id | user_id | rating | Cond 1: user_id=1 | Cond 2: rating>4.5 | OR Result | Action | |
---|---|---|---|---|---|---|---|
1 | 1 | 4.5 | ✓ TRUE | ✗ FALSE | ✓ TRUE | Keep | |
2 | 1 | 4.2 | ✓ TRUE | ✗ FALSE | ✓ TRUE | Keep | |
3 | 1 | 3.9 | ✓ TRUE | ✗ FALSE | ✓ TRUE | Keep | |
4 | 2 | 4.7 | ✗ FALSE | ✓ TRUE | ✓ TRUE | Keep | |
5 | 2 | 4.6 | ✗ FALSE | ✓ TRUE | ✓ TRUE | Keep | |
6 | 2 | 3.9 | ✗ FALSE | ✗ FALSE | ✗ FALSE | Remove | |
7 | 3 | 2.9 | ✗ FALSE | ✗ FALSE | ✗ FALSE | Remove | |
8 | 3 | 4.9 | ✗ FALSE | ✓ TRUE | ✓ TRUE | Keep | |
9 | 3 | NULL | ✗ FALSE | ? UNKNOWN | ? UNKNOWN | Remove |
SELECT with Calculations
-- Calculate double rating and categorize by quality
SELECT
user_id,
rating,
rating * 2 AS double_rating, -- AS means renaming columns
CASE -- CASE/WHEN: Creates conditional logic - like if-then-else in SQL
WHEN rating >= 4.5 THEN 'Excellent'
WHEN rating >= 4.0 THEN 'Good'
ELSE 'Average'
END AS category
FROM Listens
WHERE rating IS NOT NULL
user_id | rating | double_rating | Category | Annotations | |
---|---|---|---|---|---|
1 | 4.5 | 9.0 | Excellent | 4.5 >= 4.5? Yes | |
1 | 4.2 | 8.4 | Good | 4.2 >= 4.5? No 4.2 >= 4.0? Yes | |
1 | 3.9 | 7.8 | Average | 3.9 >= 4.5? No 3.9 >= 4.0? No | |
2 | 4.7 | 9.4 | Excellent | 4.7 >= 4.5? Yes | |
2 | 4.6 | 9.2 | Excellent | 4.6 >= 4.5? Yes | |
2 | 3.9 | 7.8 | Average | 3.9 >= 4.5? No 3.9 >= 4.0? No | |
3 | 2.9 | 5.8 | Average | 2.9 >= 4.5? No 2.9 >= 4.0? No | |
3 | 4.9 | 9.8 | Excellent | 4.9 >= 4.5? Yes |
bugfix: sept24. column header fixed to double_rating
Common Patterns
Pattern 1: Filter and Transform
-- Round ratings between 4.0 and 5.0
SELECT
user_id,
ROUND(rating, 0) AS rounded_rating
FROM Listens
WHERE rating BETWEEN 4.0 AND 5.0
Step | Operation | Effect |
---|---|---|
1 | FROM Listens | Load 9 rows |
2 | WHERE BETWEEN 4.0 AND 5.0 | Filter to 5 rows |
3 | SELECT with ROUND | Transform rating values |
Execution Order Matters
-- FAILS: Alias 'double_rating' doesn't exist in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE double_rating > 8 -- ERROR!
-- WORKS: Use original expression in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating * 2 > 8
Why It Matters:
Clause | Execution Order | Can Reference | Cannot Reference |
---|---|---|---|
FROM | 1st | Table names | SELECT aliases |
WHERE | 2nd | Table columns | SELECT aliases |
SELECT | 3rd | Table columns, expressions | Future aliases |
Common Mistakes
The Alias Timing Problem
-- WRONG: Alias r not yet created in WHERE
SELECT rating AS r FROM Listens WHERE r > 4.0
-- CORRECT: Use actual column name in WHERE
SELECT rating AS r FROM Listens WHERE rating > 4.0