SELECT-FROM-WHERE: The SQL Foundation

Core Concept

SELECT-FROM-WHERE: The Data Transformation Pipeline Execution: FROM → WHERE → SELECT (FloWS like water) 1 FROM Listens Start with all 9 rows Row 1: rating=4.5 Row 2: rating=4.2 Row 3: rating=3.9 Row 4: rating=4.7 Row 5: rating=4.6 Row 6: rating=3.9 Row 7: rating=2.9 Row 8: rating=4.9 Row 9: rating=NULL 2 WHERE rating > 4.0 Filter to 5 rows Filter: rating > 4.0 ✓ 4.5 > 4.0 TRUE ✓ 4.2 > 4.0 TRUE ✗ 3.9 > 4.0 FALSE ✓ 4.7 > 4.0 TRUE ✓ 4.6 > 4.0 TRUE ✗ 3.9 > 4.0 FALSE ✗ 2.9 > 4.0 FALSE ✓ 4.9 > 4.0 TRUE ? NULL > 4.0 UNKNOWN 5 rows pass Rows: 1,2,4,5,8 3 SELECT user_id, song_id, rating Choose 3 of 5 columns user_id song_id rating 1 1 4.5 1 2 4.2 2 2 4.7 2 7 4.6 3 2 4.9 9 rows → 5 rows (WHERE) → 3 columns (SELECT) Transform data step-by-step through the pipeline

Key Rules

  1. Execution Order: FROM → WHERE → SELECT (FWS order, not SFW. FloWS like water)

  2. Row Filtering: WHERE evaluates each row individually

  3. Column Selection: SELECT determines output columns

Basic Query Structure: FloW...S (like water)

-- Get listens with high ratings (above 4.0)
SELECT user_id, song_id, rating    -- Step last: Choose only 3 columns for output
FROM Listens                       -- Step 1: Picks table with 9 rows
WHERE rating > 4.0                 -- Step 2: Filters 9 rows; keeps subset of rows

Debug Table (row by row)

listen_iduser_idsong_idratingActionShow in Result
1114.5✓ TRUE, Keep
2124.2✓ TRUE, Keep
3163.9✗ FALSE, Remove
4224.7✓ TRUE, Keep
5274.6✓ TRUE, Keep
6283.9✗ FALSE, Remove
7312.9✗ FALSE, Remove
8324.9✓ TRUE, Keep
936NULL? UNKNOWN, Remove

Output

user_idsong_idrating
114.5
124.2
224.7
274.6
324.9

Why SQL FloWS Like This?

SQL is declarative. But to start, think of it like Python:
def high_ratings(Listens):
    output = []
    for row in Listens:              # FROM Listens
        if row.rating > 4.0:         # WHERE rating > 4.0  
            output += [row.user_id,  # SELECT user_id,
                       row.song_id,  #        song_id,
                       row.rating]   #        rating
    return output

Declarative (SQL): "Get me listens with rating > 4.0" → Database figures out where/how

Procedural (Python): "Here's my list data struct from this file on this machine. Loop through each row, check condition, build output. Parallelize with these step1, step2, step3." → You spell out each step

Key insight: SELECT is like the final mapping - it comes last in execution but defines what you return!

SELECT * vs Specific Columns

-- Get all columns for user 1's listening history
SELECT * 
FROM Listens 
WHERE user_id = 1
listen_iduser_idsong_idratinglisten_time
1114.52024-08-30 14:35:00
2124.2NULL
3163.92024-08-29 10:15:00
-- Get only user_id and rating for user 1
SELECT user_id, rating 
FROM Listens 
WHERE user_id = 1
user_idrating
14.5
14.2
13.9

WHERE with Multiple Conditions

-- Get user 1's high-rated songs (rating > 4.0)
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 AND rating > 4.0
listen_iduser_idratingAND ResultAction
114.5✓ TRUEKeep
214.2✓ TRUEKeep
313.9✗ FALSERemove
424.7✗ FALSERemove
5-9......✗ FALSERemove
-- Get all user 1's listens OR any listen rated > 4.5
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 OR rating > 4.5
listen_iduser_idratingCond 1: user_id=1Cond 2: rating>4.5OR ResultAction
114.5✓ TRUE✗ FALSE✓ TRUEKeep
214.2✓ TRUE✗ FALSE✓ TRUEKeep
313.9✓ TRUE✗ FALSE✓ TRUEKeep
424.7✗ FALSE✓ TRUE✓ TRUEKeep
524.6✗ FALSE✓ TRUE✓ TRUEKeep
623.9✗ FALSE✗ FALSE✗ FALSERemove
732.9✗ FALSE✗ FALSE✗ FALSERemove
834.9✗ FALSE✓ TRUE✓ TRUEKeep
93NULL✗ FALSE? UNKNOWN? UNKNOWNRemove

SELECT with Calculations

-- Calculate double rating and categorize by quality
SELECT 
    user_id,
    rating,
    rating * 2 AS double_rating,  -- AS means renaming columns
    CASE   -- CASE/WHEN: Creates conditional logic - like if-then-else in SQL
        WHEN rating >= 4.5 THEN 'Excellent'
        WHEN rating >= 4.0 THEN 'Good'
        ELSE 'Average'
    END AS category
FROM Listens
WHERE rating IS NOT NULL
user_idratingdouble_ratingCategoryAnnotations
14.59.0Excellent4.5 >= 4.5? Yes
14.28.4Good4.2 >= 4.5? No
4.2 >= 4.0? Yes
13.97.8Average3.9 >= 4.5? No
3.9 >= 4.0? No
24.79.4Excellent4.7 >= 4.5? Yes
24.69.2Excellent4.6 >= 4.5? Yes
23.97.8Average3.9 >= 4.5? No
3.9 >= 4.0? No
32.95.8Average2.9 >= 4.5? No
2.9 >= 4.0? No
34.99.8Excellent4.9 >= 4.5? Yes

bugfix: sept24. column header fixed to double_rating


Common Patterns

Pattern 1: Filter and Transform

-- Round ratings between 4.0 and 5.0
SELECT 
    user_id,
    ROUND(rating, 0) AS rounded_rating
FROM Listens
WHERE rating BETWEEN 4.0 AND 5.0
Step Operation Effect
1 FROM Listens Load 9 rows
2 WHERE BETWEEN 4.0 AND 5.0 Filter to 5 rows
3 SELECT with ROUND Transform rating values

Execution Order Matters

-- FAILS: Alias 'double_rating' doesn't exist in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE double_rating > 8  -- ERROR!

-- WORKS: Use original expression in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating * 2 > 8

Why It Matters:

Clause Execution Order Can Reference Cannot Reference
FROM 1st Table names SELECT aliases
WHERE 2nd Table columns SELECT aliases
SELECT 3rd Table columns, expressions Future aliases

Common Mistakes

The Alias Timing Problem

-- WRONG: Alias r not yet created in WHERE
SELECT rating AS r FROM Listens WHERE r > 4.0

-- CORRECT: Use actual column name in WHERE
SELECT rating AS r FROM Listens WHERE rating > 4.0