SELECT-FROM-WHERE: The SQL Foundation
Core Concept
- Example: Use WHERE to filter the table down to only the rows you want (e.g. 9 rows → 5 rows). Then use SELECT to pick only the columns you need (e.g. from all columns → just 3 columns).
Basic Query Structure: FloW...S (like water)
-- Get listens with high ratings (above 4.0)
SELECT user_id, song_id, rating -- Step last: Choose only 3 columns for output
FROM Listens -- Step 1: Picks table with 9 rows
WHERE rating > 4.0 -- Step 2: Filters 9 rows; keeps subset of rows
-
Execution Order: FROM → WHERE → SELECT (FWS order, not SFW. FloWS like water)
-
Row Filtering: WHERE evaluates each row individually
-
Column Selection: SELECT determines output columns
Debug Table (row by row)
| listen_id | user_id | song_id | rating | Action | Show in Result | |
|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 4.5 | ✓ TRUE, Keep | ✓ | |
| 2 | 1 | 2 | 4.2 | ✓ TRUE, Keep | ✓ | |
| 3 | 1 | 6 | 3.9 | ✗ FALSE, Remove | ✗ | |
| 4 | 2 | 2 | 4.7 | ✓ TRUE, Keep | ✓ | |
| 5 | 2 | 7 | 4.6 | ✓ TRUE, Keep | ✓ | |
| 6 | 2 | 8 | 3.9 | ✗ FALSE, Remove | ✗ | |
| 7 | 3 | 1 | 2.9 | ✗ FALSE, Remove | ✗ | |
| 8 | 3 | 2 | 4.9 | ✓ TRUE, Keep | ✓ | |
| 9 | 3 | 6 | NULL | ? UNKNOWN, Remove | ✗ |
Output
| user_id | song_id | rating | |
|---|---|---|---|
| 1 | 1 | 4.5 | |
| 1 | 2 | 4.2 | |
| 2 | 2 | 4.7 | |
| 2 | 7 | 4.6 | |
| 3 | 2 | 4.9 |
Why SQL FloWS Like This?
SQL is declarative. But to start, think of it like Python:
def high_ratings(Listens):
output = []
for row in Listens: # FROM Listens
if row.rating > 4.0: # WHERE rating > 4.0
output += [row.user_id, # SELECT user_id,
row.song_id, # song_id,
row.rating] # rating
return output
Declarative (SQL): "Get me listens with rating > 4.0" → Database figures out where/how
Procedural (Python): "Here's my list data struct from this file on this machine. Loop through each row, check condition, build output. Parallelize with these step1, step2, step3." → You spell out each step
SELECT * vs Specific Columns
-- Get all columns for user 1's listening history
SELECT *
FROM Listens
WHERE user_id = 1
| listen_id | user_id | song_id | rating | listen_time | |
|---|---|---|---|---|---|
| 1 | 1 | 1 | 4.5 | 2024-08-30 14:35:00 | |
| 2 | 1 | 2 | 4.2 | NULL | |
| 3 | 1 | 6 | 3.9 | 2024-08-29 10:15:00 |
-- Get only user_id and rating for user 1
SELECT user_id, rating
FROM Listens
WHERE user_id = 1
| user_id | rating | |
|---|---|---|
| 1 | 4.5 | |
| 1 | 4.2 | |
| 1 | 3.9 |
WHERE with Multiple Conditions
-- Get user 1's high-rated songs (rating > 4.0)
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 AND rating > 4.0
| listen_id | user_id | rating | AND Result | Action | |
|---|---|---|---|---|---|
| 1 | 1 | 4.5 | ✓ TRUE | Keep | |
| 2 | 1 | 4.2 | ✓ TRUE | Keep | |
| 3 | 1 | 3.9 | ✗ FALSE | Remove | |
| 4 | 2 | 4.7 | ✗ FALSE | Remove | |
| 5-9 | ... | ... | ✗ FALSE | Remove |
-- Get all user 1's listens OR any listen rated > 4.5
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 OR rating > 4.5
| listen_id | user_id | rating | Cond 1: user_id=1 | Cond 2: rating>4.5 | OR Result | Action | |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 4.5 | ✓ TRUE | ✗ FALSE | ✓ TRUE | Keep | |
| 2 | 1 | 4.2 | ✓ TRUE | ✗ FALSE | ✓ TRUE | Keep | |
| 3 | 1 | 3.9 | ✓ TRUE | ✗ FALSE | ✓ TRUE | Keep | |
| 4 | 2 | 4.7 | ✗ FALSE | ✓ TRUE | ✓ TRUE | Keep | |
| 5 | 2 | 4.6 | ✗ FALSE | ✓ TRUE | ✓ TRUE | Keep | |
| 6 | 2 | 3.9 | ✗ FALSE | ✗ FALSE | ✗ FALSE | Remove | |
| 7 | 3 | 2.9 | ✗ FALSE | ✗ FALSE | ✗ FALSE | Remove | |
| 8 | 3 | 4.9 | ✗ FALSE | ✓ TRUE | ✓ TRUE | Keep | |
| 9 | 3 | NULL | ✗ FALSE | ? UNKNOWN | ? UNKNOWN | Remove |
SELECT with Calculations
-- Calculate double rating and categorize by quality
SELECT
user_id,
rating,
rating * 2 AS double_rating, -- AS means renaming columns
CASE -- CASE/WHEN: Creates conditional logic - like if-then-else in SQL
WHEN rating >= 4.5 THEN 'Excellent'
WHEN rating >= 4.0 THEN 'Good'
ELSE 'Average'
END AS category
FROM Listens
WHERE rating IS NOT NULL
| user_id | rating | double_rating | Category | Annotations | |
|---|---|---|---|---|---|
| 1 | 4.5 | 9.0 | Excellent | 4.5 >= 4.5? Yes | |
| 1 | 4.2 | 8.4 | Good | 4.2 >= 4.5? No 4.2 >= 4.0? Yes | |
| 1 | 3.9 | 7.8 | Average | 3.9 >= 4.5? No 3.9 >= 4.0? No | |
| 2 | 4.7 | 9.4 | Excellent | 4.7 >= 4.5? Yes | |
| 2 | 4.6 | 9.2 | Excellent | 4.6 >= 4.5? Yes | |
| 2 | 3.9 | 7.8 | Average | 3.9 >= 4.5? No 3.9 >= 4.0? No | |
| 3 | 2.9 | 5.8 | Average | 2.9 >= 4.5? No 2.9 >= 4.0? No | |
| 3 | 4.9 | 9.8 | Excellent | 4.9 >= 4.5? Yes |
Common Patterns
Pattern 1: Filter and Transform
-- Round ratings between 4.0 and 5.0
SELECT
user_id,
ROUND(rating, 0) AS rounded_rating
FROM Listens
WHERE rating BETWEEN 4.0 AND 5.0
| Step | Operation | Effect |
|---|---|---|
| 1 | FROM Listens | Load 9 rows |
| 2 | WHERE BETWEEN 4.0 AND 5.0 | Filter to 5 rows |
| 3 | SELECT with ROUND | Transform rating values |
Understanding Aliases (AS Keyword)
What is an Alias?
An alias creates a new name for a column in your output using the AS keyword:
-- FAILS: Alias 'double_rating' doesn't exist in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE double_rating > 8 -- ERROR!
-- WORKS: Use original expression in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating * 2 > 8
Why It Matters:
| Clause | Execution Order | Can Reference | Cannot Reference |
|---|---|---|---|
| FROM | 1st | Table names | SELECT aliases |
| WHERE | 2nd | Table columns | SELECT aliases |
| SELECT | 3rd | Table columns, expressions | Future aliases |