Writing Debug Tables: Your Logic Verifier
Core Concept
- Example: LLM says 4 users, manual trace shows 1 - debug tables catch semantic errors!
Debug tables verify your SQL logic - essential for logic debugging and whiteboard interviews.
Why This Matters
🤖 LLMs and Semantics
- Syntax: 95%+ accurate
- Semantics: 16-82% accurate per BIRD benchmarks
- Same question → 2 different queries → one returns 4 users, other returns 1
- See LLM Debug
📝 Whiteboard Interviews and Tests
- Filter out expensive LLM cut-and-pasters
- "Debug this query" - common question
- Semantics = what gets you hired
- See Learning Outcomes
⚡ Actually Faster
- Trial-and-error debug loop: 30 minutes
- Drawing one debug table: 5 minutes
- Have confidence in code: Instant
- Building intuition: Priceless
See It In Action
Example: Users with Multiple Genres
-- Find users who listen to 2+ songs per genre
SELECT user_id, genre, COUNT(*) AS song_count
FROM Listens l
JOIN Songs s ON l.song_id = s.song_id
WHERE genre IS NOT NULL
GROUP BY user_id, genre
HAVING COUNT(*) >= 2
Step-by-Step Debug Table:
Step 1: FROM + JOIN (All listen records with song info)
user_id | name | song_id | genre | rating | |
---|---|---|---|---|---|
1 | Mickey | 1 | Pop | 4.5 | |
1 | Mickey | 2 | Pop | 4.2 | |
1 | Mickey | 6 | Classic | 3.9 | |
2 | Minnie | 2 | Pop | 4.7 | |
2 | Minnie | 7 | Classic | 4.6 | |
2 | Minnie | 8 | Classic | 3.9 | |
3 | Daffy | 1 | Pop | 2.9 | |
3 | Daffy | 2 | Pop | 4.9 | |
3 | Daffy | 6 | Classic | NULL |
Step 2: GROUP BY user_id, genre (Create groups)
user_id | genre | Notes: songs_in_group | Notes | |
---|---|---|---|---|
1 | Pop | 2 rows | ✓ Keep (≥2) | |
1 | Classic | 1 row | ✗ Filter out | |
2 | Pop | 1 row | ✗ Filter out | |
2 | Classic | 2 rows | ✓ Keep (≥2) | |
3 | Pop | 2 rows | ✓ Keep (≥2) | |
3 | Classic | 1 row | ✗ Filter out |
Step 3: HAVING COUNT(*) >= 2 (Final output)
user_id | genre | song_count | |
---|---|---|---|
1 | Pop | 2 | |
2 | Classic | 2 | |
3 | Pop | 2 |
Shortcuts for Big Data
Nobody Traces 100,000 Rows!
Instead of This:
user_id | genre | rating
1 | Rock | 4.5
1 | Rock | 4.2
1 | Rock | 3.9
... (497 more Rock rows)
1 | Pop | 4.8
1 | Pop | 4.1
... (300 more Pop rows)
Use This Shorthand:
GROUP BY Summary:
User 1, Rock: ~500 rows, AVG ≈ 4.2
User 1, Pop: ~300 rows, AVG ≈ 4.5
User 2, Rock: ~200 rows, AVG ≈ 3.8
...
Edge cases checked:
- MAX(rating) = 5.0 ✓
- MIN(rating) = 1.0 ✓
- NULL ratings: 42 rows ✓
Simple Notation Tricks
Scenario | Full Trace | Smart Shorthand |
---|---|---|
1000 rows after JOIN | Show all 1000 | "First 3... middle ties... last 2" |
GROUP BY with counts | List every row | "Rock: ~500, Pop: ~300" |
Window function ranks | Show all ranks | "Top 3, any ties, bottom 2" |
Complex aggregation | Calculate exactly | "AVG ≈ 4.2" (close enough!) |
What Actually Matters
✓ Quick Verification Checklist
✓ JOINs: Expected row count? (not accidentally cartesian? INNER? LEFT? OUTER?)
✓ NULLs: Handled correctly? (IS NULL vs = NULL?)
✓ Groups: Right granularity? (user vs user+genre?)
✓ Aggregates: Makes sense? (COUNT(*) vs COUNT(column)?)
✓ Windows: Partitions correct? (RANK vs ROW_NUMBER?)
Remember: It's YOUR Tool
🎯 The Bottom Line
- Tool for correctness, not busy work - skip obvious parts, focus on confusing bits
- Like multiplication tables in school - use them often early to build intuition
- You'll naturally start skipping easy steps as your SQL brain develops
- Use simple conventions that interviewers and graders can follow (for partial credit!)
- Approximate liberally - "~500 rows" is perfectly fine
Debug tables caught that JOIN bug? Great! Drew them with crayons? Sure. Hieroglyphs only you can understand? Let's reconsider...
For Interviews, PSETs, Tests
❌ Cryptic Hieroglyphs
𓀀 → 𓂋 𓊖 = 2 𓆣 𓁶 → 𓂋 𓄿 = 1 ✗ 𓅱 → 𓂋 𓊖 = 2 𓆣 ∴ 𓆣 = {𓀀𓊖, 𓅱𓊖}
"Trust me, 𓀀 means Mickey and 𓆣 means keep..."
✓ Clear Convention
User 1, Pop: 2 songs → Keep ✓ User 2, Pop: 1 song → Filter ✗ User 3, Pop: 2 songs → Keep ✓ Final: {(1,Pop,2), (3,Pop,2)}
Interviewer: "I can follow this!" (+partial credit)