SELECT-FROM-WHERE: The SQL Foundation

Core Concept

Example: Use WHERE to filter the table down to only the rows you want (e.g. 9 rows → 5 rows). Then use SELECT to pick only the columns you need (e.g. from all columns → just 3 columns).

🌊 Remember: SQL FloWS Like Water!

FROM → WHERE → SELECT

Even though you write SELECT first, SQL executes in FloWS order:

FROM: Load the table
WHERE: Filter rows
SELECT: Choose columns and create aliases

💡 Key insight: Aliases (AS) are created in step 3, so you can't use them in step 2 (WHERE)!

Key Rules

Execution Order: FROM → WHERE → SELECT (FWS order, not SFW. FloWS like water)
Row Filtering: WHERE evaluates each row individually
Column Selection: SELECT determines output columns

Basic Query Structure: FloW...S (like water)

-- Get listens with high ratings (above 4.0)
SELECT user_id, song_id, rating    -- Step last: Choose only 3 columns for output
FROM Listens                       -- Step 1: Picks table with 9 rows
WHERE rating > 4.0                 -- Step 2: Filters 9 rows; keeps subset of rows

Debug Table (row by row)

listen_id	user_id	song_id	rating	Action	Show in Result
1	1	1	4.5	✓ TRUE, Keep	✓
2	1	2	4.2	✓ TRUE, Keep	✓
3	1	6	3.9	✗ FALSE, Remove	✗
4	2	2	4.7	✓ TRUE, Keep	✓
5	2	7	4.6	✓ TRUE, Keep	✓
6	2	8	3.9	✗ FALSE, Remove	✗
7	3	1	2.9	✗ FALSE, Remove	✗
8	3	2	4.9	✓ TRUE, Keep	✓
9	3	6	NULL	? UNKNOWN, Remove	✗

Output

user_id	song_id	rating
1	1	4.5
1	2	4.2
2	2	4.7
2	7	4.6
3	2	4.9

Why SQL FloWS Like This?

SQL is declarative. But to start, think of it like Python:

def high_ratings(Listens):
    output = []
    for row in Listens:              # FROM Listens
        if row.rating > 4.0:         # WHERE rating > 4.0  
            output += [row.user_id,  # SELECT user_id,
                       row.song_id,  #        song_id,
                       row.rating]   #        rating
    return output

Declarative (SQL): "Get me listens with rating > 4.0" → Database figures out where/how

Procedural (Python): "Here's my list data struct from this file on this machine. Loop through each row, check condition, build output. Parallelize with these step1, step2, step3." → You spell out each step

Key insight: SELECT is like the final mapping - it comes last in execution but defines what you return!

SELECT * vs Specific Columns

-- Get all columns for user 1's listening history
SELECT * 
FROM Listens 
WHERE user_id = 1

listen_id	user_id	song_id	rating	listen_time
1	1	1	4.5	2024-08-30 14:35:00
2	1	2	4.2	NULL
3	1	6	3.9	2024-08-29 10:15:00

-- Get only user_id and rating for user 1
SELECT user_id, rating 
FROM Listens 
WHERE user_id = 1

	user_id	rating
	1	4.5
	1	4.2
	1	3.9

WHERE with Multiple Conditions

-- Get user 1's high-rated songs (rating > 4.0)
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 AND rating > 4.0

listen_id	user_id	rating	AND Result	Action
1	1	4.5	✓ TRUE	Keep
2	1	4.2	✓ TRUE	Keep
3	1	3.9	✗ FALSE	Remove
4	2	4.7	✗ FALSE	Remove
5-9	...	...	✗ FALSE	Remove

-- Get all user 1's listens OR any listen rated > 4.5
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 OR rating > 4.5

listen_id	user_id	rating	Cond 1: user_id=1	Cond 2: rating>4.5	OR Result	Action
1	1	4.5	✓ TRUE	✗ FALSE	✓ TRUE	Keep
2	1	4.2	✓ TRUE	✗ FALSE	✓ TRUE	Keep
3	1	3.9	✓ TRUE	✗ FALSE	✓ TRUE	Keep
4	2	4.7	✗ FALSE	✓ TRUE	✓ TRUE	Keep
5	2	4.6	✗ FALSE	✓ TRUE	✓ TRUE	Keep
6	2	3.9	✗ FALSE	✗ FALSE	✗ FALSE	Remove
7	3	2.9	✗ FALSE	✗ FALSE	✗ FALSE	Remove
8	3	4.9	✗ FALSE	✓ TRUE	✓ TRUE	Keep
9	3	NULL	✗ FALSE	? UNKNOWN	? UNKNOWN	Remove

SELECT with Calculations

-- Calculate double rating and categorize by quality
SELECT 
    user_id,
    rating,
    rating * 2 AS double_rating,  -- AS means renaming columns
    CASE   -- CASE/WHEN: Creates conditional logic - like if-then-else in SQL
        WHEN rating >= 4.5 THEN 'Excellent'
        WHEN rating >= 4.0 THEN 'Good'
        ELSE 'Average'
    END AS category
FROM Listens
WHERE rating IS NOT NULL

user_id	rating	double_rating	Category	Annotations
1	4.5	9.0	Excellent	4.5 >= 4.5? Yes
1	4.2	8.4	Good	4.2 >= 4.5? No 4.2 >= 4.0? Yes
1	3.9	7.8	Average	3.9 >= 4.5? No 3.9 >= 4.0? No
2	4.7	9.4	Excellent	4.7 >= 4.5? Yes
2	4.6	9.2	Excellent	4.6 >= 4.5? Yes
2	3.9	7.8	Average	3.9 >= 4.5? No 3.9 >= 4.0? No
3	2.9	5.8	Average	2.9 >= 4.5? No 2.9 >= 4.0? No
3	4.9	9.8	Excellent	4.9 >= 4.5? Yes

bugfix: sept24. column header fixed to double_rating

Common Patterns

Pattern 1: Filter and Transform

-- Round ratings between 4.0 and 5.0
SELECT 
    user_id,
    ROUND(rating, 0) AS rounded_rating
FROM Listens
WHERE rating BETWEEN 4.0 AND 5.0

Step	Operation	Effect
1	FROM Listens	Load 9 rows
2	WHERE BETWEEN 4.0 AND 5.0	Filter to 5 rows
3	SELECT with ROUND	Transform rating values

Execution Order Matters

-- FAILS: Alias 'double_rating' doesn't exist in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE double_rating > 8  -- ERROR!

-- WORKS: Use original expression in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating * 2 > 8

Why It Matters:

Clause	Execution Order	Can Reference	Cannot Reference
FROM	1st	Table names	SELECT aliases
WHERE	2nd	Table columns	SELECT aliases
SELECT	3rd	Table columns, expressions	Future aliases

Understanding Aliases (AS Keyword)

What is an Alias?

An alias creates a new name for a column in your output using the AS keyword:

SELECT rating AS r           -- Creates alias 'r' for rating column
SELECT rating * 2 AS doubled  -- Creates alias 'doubled' for calculated value

Think of AS as a "rename" that happens at the very end of query execution.

🚨 The Alias Timing Problem

Why this fails:

-- ❌ ERROR: Can't use alias 'r' in WHERE
SELECT rating AS r 
FROM Listens 
WHERE r > 4.0  -- 'r' doesn't exist yet!

The execution timeline:

Step 1: FROM Listens     → TABLE loaded, columns: rating, user_id, song_id...
Step 2: WHERE r > 4.0    → Error! 'r' doesn't exist yet
Step 3: SELECT rating AS r → NOW 'r' alias IS created

The correct way:

-- ✅ WORKS: Use actual column name in WHERE
SELECT rating AS r 
FROM Listens 
WHERE rating > 4.0  -- Use original column name

More Alias Examples

-- ❌ WRONG: Using calculated alias in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE double_rating > 8  -- Error! 'double_rating' created after WHERE

-- ✅ CORRECT: Repeat the calculation in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating * 2 > 8     -- Calculate directly in WHERE

-- ✅ ALSO CORRECT: Use original column
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating > 4         -- Use base column for filtering

Quick Rule: Aliases live in SELECT, not WHERE

Clause	Can Use	Cannot Use
WHERE	Table columns (`rating`)	SELECT aliases (`r`)
SELECT	Table columns + Aliases	Future aliases