SELECT-FROM-WHERE: The SQL Foundation

Core Concept

🌊 Remember: SQL FloWS Like Water!

FROM → WHERE → SELECT

Even though you write SELECT first, SQL executes in FloWS order:

  1. FROM: Load the table

  2. WHERE: Filter rows

  3. SELECT: Choose columns and create aliases

💡 Key insight: Aliases (AS) are created in step 3, so you can't use them in step 2 (WHERE)!

SELECT-FROM-WHERE: The Data Transformation Pipeline Execution: FROM → WHERE → SELECT (FloWS like water) 1 FROM Listens Start with all 9 rows Row 1: rating=4.5 Row 2: rating=4.2 Row 3: rating=3.9 Row 4: rating=4.7 Row 5: rating=4.6 Row 6: rating=3.9 Row 7: rating=2.9 Row 8: rating=4.9 Row 9: rating=NULL 2 WHERE rating > 4.0 Filter to 5 rows Filter: rating > 4.0 ✓ 4.5 > 4.0 TRUE ✓ 4.2 > 4.0 TRUE ✗ 3.9 > 4.0 FALSE ✓ 4.7 > 4.0 TRUE ✓ 4.6 > 4.0 TRUE ✗ 3.9 > 4.0 FALSE ✗ 2.9 > 4.0 FALSE ✓ 4.9 > 4.0 TRUE ? NULL > 4.0 UNKNOWN 5 rows pass Rows: 1,2,4,5,8 3 SELECT user_id, song_id, rating Choose 3 of 5 columns user_id song_id rating 1 1 4.5 1 2 4.2 2 2 4.7 2 7 4.6 3 2 4.9 9 rows → 5 rows (WHERE) → 3 columns (SELECT) Transform data step-by-step through the pipeline

Key Rules

  1. Execution Order: FROM → WHERE → SELECT (FWS order, not SFW. FloWS like water)

  2. Row Filtering: WHERE evaluates each row individually

  3. Column Selection: SELECT determines output columns

Basic Query Structure: FloW...S (like water)

-- Get listens with high ratings (above 4.0)
SELECT user_id, song_id, rating    -- Step last: Choose only 3 columns for output
FROM Listens                       -- Step 1: Picks table with 9 rows
WHERE rating > 4.0                 -- Step 2: Filters 9 rows; keeps subset of rows

Debug Table (row by row)

listen_iduser_idsong_idratingActionShow in Result
1114.5✓ TRUE, Keep
2124.2✓ TRUE, Keep
3163.9✗ FALSE, Remove
4224.7✓ TRUE, Keep
5274.6✓ TRUE, Keep
6283.9✗ FALSE, Remove
7312.9✗ FALSE, Remove
8324.9✓ TRUE, Keep
936NULL? UNKNOWN, Remove

Output

user_idsong_idrating
114.5
124.2
224.7
274.6
324.9

Why SQL FloWS Like This?

SQL is declarative. But to start, think of it like Python:
def high_ratings(Listens):
    output = []
    for row in Listens:              # FROM Listens
        if row.rating > 4.0:         # WHERE rating > 4.0  
            output += [row.user_id,  # SELECT user_id,
                       row.song_id,  #        song_id,
                       row.rating]   #        rating
    return output

Declarative (SQL): "Get me listens with rating > 4.0" → Database figures out where/how

Procedural (Python): "Here's my list data struct from this file on this machine. Loop through each row, check condition, build output. Parallelize with these step1, step2, step3." → You spell out each step

Key insight: SELECT is like the final mapping - it comes last in execution but defines what you return!

SELECT * vs Specific Columns

-- Get all columns for user 1's listening history
SELECT * 
FROM Listens 
WHERE user_id = 1
listen_iduser_idsong_idratinglisten_time
1114.52024-08-30 14:35:00
2124.2NULL
3163.92024-08-29 10:15:00
-- Get only user_id and rating for user 1
SELECT user_id, rating 
FROM Listens 
WHERE user_id = 1
user_idrating
14.5
14.2
13.9

WHERE with Multiple Conditions

-- Get user 1's high-rated songs (rating > 4.0)
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 AND rating > 4.0
listen_iduser_idratingAND ResultAction
114.5✓ TRUEKeep
214.2✓ TRUEKeep
313.9✗ FALSERemove
424.7✗ FALSERemove
5-9......✗ FALSERemove
-- Get all user 1's listens OR any listen rated > 4.5
SELECT user_id, song_id, rating
FROM Listens
WHERE user_id = 1 OR rating > 4.5
listen_iduser_idratingCond 1: user_id=1Cond 2: rating>4.5OR ResultAction
114.5✓ TRUE✗ FALSE✓ TRUEKeep
214.2✓ TRUE✗ FALSE✓ TRUEKeep
313.9✓ TRUE✗ FALSE✓ TRUEKeep
424.7✗ FALSE✓ TRUE✓ TRUEKeep
524.6✗ FALSE✓ TRUE✓ TRUEKeep
623.9✗ FALSE✗ FALSE✗ FALSERemove
732.9✗ FALSE✗ FALSE✗ FALSERemove
834.9✗ FALSE✓ TRUE✓ TRUEKeep
93NULL✗ FALSE? UNKNOWN? UNKNOWNRemove

SELECT with Calculations

-- Calculate double rating and categorize by quality
SELECT 
    user_id,
    rating,
    rating * 2 AS double_rating,  -- AS means renaming columns
    CASE   -- CASE/WHEN: Creates conditional logic - like if-then-else in SQL
        WHEN rating >= 4.5 THEN 'Excellent'
        WHEN rating >= 4.0 THEN 'Good'
        ELSE 'Average'
    END AS category
FROM Listens
WHERE rating IS NOT NULL
user_idratingdouble_ratingCategoryAnnotations
14.59.0Excellent4.5 >= 4.5? Yes
14.28.4Good4.2 >= 4.5? No
4.2 >= 4.0? Yes
13.97.8Average3.9 >= 4.5? No
3.9 >= 4.0? No
24.79.4Excellent4.7 >= 4.5? Yes
24.69.2Excellent4.6 >= 4.5? Yes
23.97.8Average3.9 >= 4.5? No
3.9 >= 4.0? No
32.95.8Average2.9 >= 4.5? No
2.9 >= 4.0? No
34.99.8Excellent4.9 >= 4.5? Yes

bugfix: sept24. column header fixed to double_rating


Common Patterns

Pattern 1: Filter and Transform

-- Round ratings between 4.0 and 5.0
SELECT 
    user_id,
    ROUND(rating, 0) AS rounded_rating
FROM Listens
WHERE rating BETWEEN 4.0 AND 5.0
Step Operation Effect
1 FROM Listens Load 9 rows
2 WHERE BETWEEN 4.0 AND 5.0 Filter to 5 rows
3 SELECT with ROUND Transform rating values

Execution Order Matters

-- FAILS: Alias 'double_rating' doesn't exist in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE double_rating > 8  -- ERROR!

-- WORKS: Use original expression in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating * 2 > 8

Why It Matters:

Clause Execution Order Can Reference Cannot Reference
FROM 1st Table names SELECT aliases
WHERE 2nd Table columns SELECT aliases
SELECT 3rd Table columns, expressions Future aliases

Understanding Aliases (AS Keyword)

What is an Alias?

An alias creates a new name for a column in your output using the AS keyword:

SELECT rating AS r           -- Creates alias 'r' for rating column
SELECT rating * 2 AS doubled  -- Creates alias 'doubled' for calculated value

Think of AS as a "rename" that happens at the very end of query execution.

🚨 The Alias Timing Problem

Why this fails:

-- ❌ ERROR: Can't use alias 'r' in WHERE
SELECT rating AS r 
FROM Listens 
WHERE r > 4.0  -- 'r' doesn't exist yet!

The execution timeline:

Step 1: FROM ListensTABLE loaded, columns: rating, user_id, song_id...
Step 2: WHERE r > 4.0    → Error! 'r' doesn't exist yet
Step 3: SELECT rating AS r → NOW 'r' alias IS created

The correct way:

-- ✅ WORKS: Use actual column name in WHERE
SELECT rating AS r 
FROM Listens 
WHERE rating > 4.0  -- Use original column name

More Alias Examples

-- ❌ WRONG: Using calculated alias in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE double_rating > 8  -- Error! 'double_rating' created after WHERE

-- ✅ CORRECT: Repeat the calculation in WHERE
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating * 2 > 8     -- Calculate directly in WHERE

-- ✅ ALSO CORRECT: Use original column
SELECT rating * 2 AS double_rating
FROM Listens
WHERE rating > 4         -- Use base column for filtering

Quick Rule: Aliases live in SELECT, not WHERE

Clause Can Use Cannot Use
WHERE Table columns (rating) SELECT aliases (r)
SELECT Table columns + Aliases Future aliases

Common Mistakes Summary