SELECT-FROM-WHERE: The SQL Foundation

Core Concept

SELECT-FROM-WHERE: The Data Transformation Pipeline Execution: FROM → WHERE → SELECT (FloWS like water) 1 FROM Listens Start with all 9 rows Row 1: rating=4.5 Row 2: rating=4.2 Row 3: rating=3.9 Row 4: rating=4.7 Row 5: rating=4.6 Row 6: rating=3.9 Row 7: rating=2.9 Row 8: rating=4.9 Row 9: rating=NULL 2 WHERE rating > 4.0 Filter to 5 rows Filter: rating > 4.0 ✓ 4.5 > 4.0 TRUE ✓ 4.2 > 4.0 TRUE ✗ 3.9 > 4.0 FALSE ✓ 4.7 > 4.0 TRUE ✓ 4.6 > 4.0 TRUE ✗ 3.9 > 4.0 FALSE ✗ 2.9 > 4.0 FALSE ✓ 4.9 > 4.0 TRUE ? NULL > 4.0 UNKNOWN 5 rows pass Rows: 1,2,4,5,8 3 SELECT user_id, song_id, rating Choose 3 of 5 columns user_id song_id rating 1 1 4.5 1 2 4.2 2 2 4.7 2 7 4.6 3 2 4.9 9 rows → 5 rows (WHERE) → 3 columns (SELECT) Transform data step-by-step through the pipeline

Basic Query Structure: FloW...S (like water)

-- Get listens with high ratings (above 4.0)
SELECT user_id, song_id, rating    -- Step last: Choose only 3 columns for output
FROM Listens                       -- Step 1: Picks table with 9 rows
WHERE rating > 4.0                 -- Step 2: Filters 9 rows; keeps subset of rows

  1. Execution Order: FROM → WHERE → SELECT (FWS order, not SFW. FloWS like water)

  2. Row Filtering: WHERE evaluates each row individually

  3. Column Selection: SELECT determines output columns

Debug Table (row by row)

listen_iduser_idsong_idratingActionShow in Result
1114.5✓ TRUE, Keep
2124.2✓ TRUE, Keep
3163.9✗ FALSE, Remove
4224.7✓ TRUE, Keep
5274.6✓ TRUE, Keep
6283.9✗ FALSE, Remove
7312.9✗ FALSE, Remove
8324.9✓ TRUE, Keep
936NULL? UNKNOWN, Remove

Output

user_idsong_idrating
114.5
124.2
224.7
274.6
324.9

Why SQL FloWS Like This?

SQL is declarative. But to start, think of it like Python:
def high_ratings(Listens):
    output = []
    for row in Listens:              # FROM Listens
        if row.rating > 4.0:         # WHERE rating > 4.0  
            output += [row.user_id,  # SELECT user_id,
                       row.song_id,  #        song_id,
                       row.rating]   #        rating
    return output

Declarative (SQL): "Get me listens with rating > 4.0" → Database figures out where/how

Procedural (Python): "Here's my list data struct from this file on this machine. Loop through each row, check condition, build output. Parallelize with these step1, step2, step3." → You spell out each step

Key insight: SELECT is like the final mapping - it comes last in execution but defines what you return!

SELECT * vs Specific Columns

-- Get all columns for user 1's listening history
SELECT * 
FROM Listens 
WHERE user_id = 1
listen_iduser_idsong_idratinglisten_time
1114.52024-08-30 14:35:00
2124.2NULL
3163.92024-08-29 10:15:00
-- Get only user_id and rating for user 1
SELECT user_id, rating 
FROM Listens 
WHERE user_id = 1
user_idrating
14.5
14.2
13.9

WHERE with Multiple Conditions

-- Get user 1's high-rated songs (rating > 4.0)
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 AND rating > 4.0
listen_iduser_idratingAND ResultAction
114.5✓ TRUEKeep
214.2✓ TRUEKeep
313.9✗ FALSERemove
424.7✗ FALSERemove
5-9......✗ FALSERemove
-- Get all user 1's listens OR any listen rated > 4.5
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 OR rating > 4.5
listen_iduser_idratingCond 1: user_id=1Cond 2: rating>4.5OR ResultAction
114.5✓ TRUE✗ FALSE✓ TRUEKeep
214.2✓ TRUE✗ FALSE✓ TRUEKeep
313.9✓ TRUE✗ FALSE✓ TRUEKeep
424.7✗ FALSE✓ TRUE✓ TRUEKeep
524.6✗ FALSE✓ TRUE✓ TRUEKeep
623.9✗ FALSE✗ FALSE✗ FALSERemove
732.9✗ FALSE✗ FALSE✗ FALSERemove
834.9✗ FALSE✓ TRUE✓ TRUEKeep
93NULL✗ FALSE? UNKNOWN? UNKNOWNRemove

SELECT with Calculations

-- Calculate double rating and categorize by quality
SELECT 
    user_id,
    rating,
    rating * 2 AS double_rating,  -- AS means renaming columns
    CASE   -- CASE/WHEN: Creates conditional logic - like if-then-else in SQL
        WHEN rating >= 4.5 THEN 'Excellent'
        WHEN rating >= 4.0 THEN 'Good'
        ELSE 'Average'
    END AS category
FROM Listens
WHERE rating IS NOT NULL
user_idratingdouble_ratingCategoryAnnotations
14.59.0Excellent4.5 >= 4.5? Yes
14.28.4Good4.2 >= 4.5? No
4.2 >= 4.0? Yes
13.97.8Average3.9 >= 4.5? No
3.9 >= 4.0? No
24.79.4Excellent4.7 >= 4.5? Yes
24.69.2Excellent4.6 >= 4.5? Yes
23.97.8Average3.9 >= 4.5? No
3.9 >= 4.0? No
32.95.8Average2.9 >= 4.5? No
2.9 >= 4.0? No
34.99.8Excellent4.9 >= 4.5? Yes

Common Patterns

Pattern 1: Filter and Transform

-- Round ratings between 4.0 and 5.0
SELECT 
    user_id,
    ROUND(rating, 0) AS rounded_rating
FROM Listens
WHERE rating BETWEEN 4.0 AND 5.0
Step Operation Effect
1 FROM Listens Load 9 rows
2 WHERE BETWEEN 4.0 AND 5.0 Filter to 5 rows
3 SELECT with ROUND Transform rating values

Understanding Aliases (AS Keyword)

What is an Alias?

An alias creates a new name for a column in your output using the AS keyword:

-- FAILS: Alias 'double_rating' doesn't exist in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE double_rating > 8  -- ERROR!

-- WORKS: Use original expression in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating * 2 > 8

Why It Matters:

Clause Execution Order Can Reference Cannot Reference
FROM 1st Table names SELECT aliases
WHERE 2nd Table columns SELECT aliases
SELECT 3rd Table columns, expressions Future aliases