SELECT-FROM-WHERE: The SQL Foundation
Core Concept
- Example: Use WHERE to filter the table down to only the rows you want (e.g. 9 rows → 5 rows). Then use SELECT to pick only the columns you need (e.g. from all columns → just 3 columns).
🌊 Remember: SQL FloWS Like Water!
FROM → WHERE → SELECT
Even though you write SELECT first, SQL executes in FloWS order:
-
FROM: Load the table
-
WHERE: Filter rows
-
SELECT: Choose columns and create aliases
💡 Key insight: Aliases (AS) are created in step 3, so you can't use them in step 2 (WHERE)!
Key Rules
-
Execution Order: FROM → WHERE → SELECT (FWS order, not SFW. FloWS like water)
-
Row Filtering: WHERE evaluates each row individually
-
Column Selection: SELECT determines output columns
Basic Query Structure: FloW...S (like water)
-- Get listens with high ratings (above 4.0)
SELECT user_id, song_id, rating -- Step last: Choose only 3 columns for output
FROM Listens -- Step 1: Picks table with 9 rows
WHERE rating > 4.0 -- Step 2: Filters 9 rows; keeps subset of rows
Debug Table (row by row)
| listen_id | user_id | song_id | rating | Action | Show in Result | |
|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 4.5 | ✓ TRUE, Keep | ✓ | |
| 2 | 1 | 2 | 4.2 | ✓ TRUE, Keep | ✓ | |
| 3 | 1 | 6 | 3.9 | ✗ FALSE, Remove | ✗ | |
| 4 | 2 | 2 | 4.7 | ✓ TRUE, Keep | ✓ | |
| 5 | 2 | 7 | 4.6 | ✓ TRUE, Keep | ✓ | |
| 6 | 2 | 8 | 3.9 | ✗ FALSE, Remove | ✗ | |
| 7 | 3 | 1 | 2.9 | ✗ FALSE, Remove | ✗ | |
| 8 | 3 | 2 | 4.9 | ✓ TRUE, Keep | ✓ | |
| 9 | 3 | 6 | NULL | ? UNKNOWN, Remove | ✗ |
Output
| user_id | song_id | rating | |
|---|---|---|---|
| 1 | 1 | 4.5 | |
| 1 | 2 | 4.2 | |
| 2 | 2 | 4.7 | |
| 2 | 7 | 4.6 | |
| 3 | 2 | 4.9 |
Why SQL FloWS Like This?
def high_ratings(Listens):
output = []
for row in Listens: # FROM Listens
if row.rating > 4.0: # WHERE rating > 4.0
output += [row.user_id, # SELECT user_id,
row.song_id, # song_id,
row.rating] # rating
return output
Declarative (SQL): "Get me listens with rating > 4.0" → Database figures out where/how
Procedural (Python): "Here's my list data struct from this file on this machine. Loop through each row, check condition, build output. Parallelize with these step1, step2, step3." → You spell out each step
SELECT * vs Specific Columns
-- Get all columns for user 1's listening history
SELECT *
FROM Listens
WHERE user_id = 1
| listen_id | user_id | song_id | rating | listen_time | |
|---|---|---|---|---|---|
| 1 | 1 | 1 | 4.5 | 2024-08-30 14:35:00 | |
| 2 | 1 | 2 | 4.2 | NULL | |
| 3 | 1 | 6 | 3.9 | 2024-08-29 10:15:00 |
-- Get only user_id and rating for user 1
SELECT user_id, rating
FROM Listens
WHERE user_id = 1
| user_id | rating | |
|---|---|---|
| 1 | 4.5 | |
| 1 | 4.2 | |
| 1 | 3.9 |
WHERE with Multiple Conditions
-- Get user 1's high-rated songs (rating > 4.0)
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 AND rating > 4.0
| listen_id | user_id | rating | AND Result | Action | |
|---|---|---|---|---|---|
| 1 | 1 | 4.5 | ✓ TRUE | Keep | |
| 2 | 1 | 4.2 | ✓ TRUE | Keep | |
| 3 | 1 | 3.9 | ✗ FALSE | Remove | |
| 4 | 2 | 4.7 | ✗ FALSE | Remove | |
| 5-9 | ... | ... | ✗ FALSE | Remove |
-- Get all user 1's listens OR any listen rated > 4.5
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 OR rating > 4.5
| listen_id | user_id | rating | Cond 1: user_id=1 | Cond 2: rating>4.5 | OR Result | Action | |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 4.5 | ✓ TRUE | ✗ FALSE | ✓ TRUE | Keep | |
| 2 | 1 | 4.2 | ✓ TRUE | ✗ FALSE | ✓ TRUE | Keep | |
| 3 | 1 | 3.9 | ✓ TRUE | ✗ FALSE | ✓ TRUE | Keep | |
| 4 | 2 | 4.7 | ✗ FALSE | ✓ TRUE | ✓ TRUE | Keep | |
| 5 | 2 | 4.6 | ✗ FALSE | ✓ TRUE | ✓ TRUE | Keep | |
| 6 | 2 | 3.9 | ✗ FALSE | ✗ FALSE | ✗ FALSE | Remove | |
| 7 | 3 | 2.9 | ✗ FALSE | ✗ FALSE | ✗ FALSE | Remove | |
| 8 | 3 | 4.9 | ✗ FALSE | ✓ TRUE | ✓ TRUE | Keep | |
| 9 | 3 | NULL | ✗ FALSE | ? UNKNOWN | ? UNKNOWN | Remove |
SELECT with Calculations
-- Calculate double rating and categorize by quality
SELECT
user_id,
rating,
rating * 2 AS double_rating, -- AS means renaming columns
CASE -- CASE/WHEN: Creates conditional logic - like if-then-else in SQL
WHEN rating >= 4.5 THEN 'Excellent'
WHEN rating >= 4.0 THEN 'Good'
ELSE 'Average'
END AS category
FROM Listens
WHERE rating IS NOT NULL
| user_id | rating | double_rating | Category | Annotations | |
|---|---|---|---|---|---|
| 1 | 4.5 | 9.0 | Excellent | 4.5 >= 4.5? Yes | |
| 1 | 4.2 | 8.4 | Good | 4.2 >= 4.5? No 4.2 >= 4.0? Yes | |
| 1 | 3.9 | 7.8 | Average | 3.9 >= 4.5? No 3.9 >= 4.0? No | |
| 2 | 4.7 | 9.4 | Excellent | 4.7 >= 4.5? Yes | |
| 2 | 4.6 | 9.2 | Excellent | 4.6 >= 4.5? Yes | |
| 2 | 3.9 | 7.8 | Average | 3.9 >= 4.5? No 3.9 >= 4.0? No | |
| 3 | 2.9 | 5.8 | Average | 2.9 >= 4.5? No 2.9 >= 4.0? No | |
| 3 | 4.9 | 9.8 | Excellent | 4.9 >= 4.5? Yes |
bugfix: sept24. column header fixed to double_rating
Common Patterns
Pattern 1: Filter and Transform
-- Round ratings between 4.0 and 5.0
SELECT
user_id,
ROUND(rating, 0) AS rounded_rating
FROM Listens
WHERE rating BETWEEN 4.0 AND 5.0
| Step | Operation | Effect |
|---|---|---|
| 1 | FROM Listens | Load 9 rows |
| 2 | WHERE BETWEEN 4.0 AND 5.0 | Filter to 5 rows |
| 3 | SELECT with ROUND | Transform rating values |
Execution Order Matters
-- FAILS: Alias 'double_rating' doesn't exist in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE double_rating > 8 -- ERROR!
-- WORKS: Use original expression in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating * 2 > 8
Why It Matters:
| Clause | Execution Order | Can Reference | Cannot Reference |
|---|---|---|---|
| FROM | 1st | Table names | SELECT aliases |
| WHERE | 2nd | Table columns | SELECT aliases |
| SELECT | 3rd | Table columns, expressions | Future aliases |
Understanding Aliases (AS Keyword)
What is an Alias?
An alias creates a new name for a column in your output using the AS keyword:
SELECT rating AS r -- Creates alias 'r' for rating column
SELECT rating * 2 AS doubled -- Creates alias 'doubled' for calculated value
Think of AS as a "rename" that happens at the very end of query execution.
🚨 The Alias Timing Problem
Why this fails:
-- ❌ ERROR: Can't use alias 'r' in WHERE
SELECT rating AS r
FROM Listens
WHERE r > 4.0 -- 'r' doesn't exist yet!
The execution timeline:
Step 1: FROM Listens → TABLE loaded, columns: rating, user_id, song_id...
Step 2: WHERE r > 4.0 → Error! 'r' doesn't exist yet
Step 3: SELECT rating AS r → NOW 'r' alias IS created
The correct way:
-- ✅ WORKS: Use actual column name in WHERE
SELECT rating AS r
FROM Listens
WHERE rating > 4.0 -- Use original column name
More Alias Examples
-- ❌ WRONG: Using calculated alias in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE double_rating > 8 -- Error! 'double_rating' created after WHERE
-- ✅ CORRECT: Repeat the calculation in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating * 2 > 8 -- Calculate directly in WHERE
-- ✅ ALSO CORRECT: Use original column
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating > 4 -- Use base column for filtering
Quick Rule: Aliases live in SELECT, not WHERE
| Clause | Can Use | Cannot Use |
|---|---|---|
| WHERE | Table columns (rating) |
SELECT aliases (r) |
| SELECT | Table columns + Aliases | Future aliases |