LLM SQL Debugging: When AI Gets Logic Wrong
Core Concept
- Example: Same question, different logic and queries - one returns 4 users, another returns 1 (both syntactically correct!)
Myth: "LLMs Will Replace SQL Skills"
-
LLMs: Pros: Fast syntax. Con: Weak semantics and hallucinations
-
Context: LLMs can't fit your 10 GB or TB database into context/token windows
-
Cost: Full DB in context = bankruptcy :-)
The Sweet Spot: LLM generates boilerplate → Verify the logic using Debug Tables → Ship faster
Real-World Performance Data
The BIRD benchmark (https://bird-bench.github.io/) tests LLMs on real SQL generation:
-
Syntax Accuracy: 95%+ (most LLMs handle SQL syntax well)
-
Semantic Accuracy: 16% to 77% (varies significantly by model) -- September 2025
-
The Gap: Queries that run without errors but produce wrong results
Why you get paid the big bucks Focus on semantics. Deeply understand both SQL and business requirements!
Query Comparison: "Users who don't listen to Taylor Swift"
Query A: LEFT JOIN Approach
-- Find users who listened to non-Taylor songs (or no songs)
SELECT u.user_id, u.name
FROM Users u
LEFT JOIN Listens l ON u.user_id = l.user_id
LEFT JOIN Songs s ON l.song_id = s.song_id
WHERE s.artist != 'Taylor Swift' OR s.artist IS NULL
GROUP BY u.user_id, u.name
Query B: NOT IN Subquery Approach
-- Find users who never listened to any Taylor Swift song
SELECT u.user_id, u.name
FROM Users u
WHERE u.user_id NOT IN (
SELECT DISTINCT l.user_id
FROM Listens l
JOIN Songs s ON l.song_id = s.song_id
WHERE s.artist = 'Taylor Swift'
)
l.user_id
can't be NULL per our Spotify schema. Adding WHERE l.user_id IS NOT NULL
is still fine defensive practice.
Step-by-Step Debug Tables
Query A Execution Trace
Step 1: LEFT JOIN Users with Listens
user_id | name | listen_id | song_id | rating | Notes | |
---|---|---|---|---|---|---|
1 | Mickey | 1 | 1 | 4.5 | ||
1 | Mickey | 2 | 2 | 4.2 | ||
1 | Mickey | 3 | 6 | 3.9 | ||
2 | Minnie | 4 | 2 | 4.7 | ||
2 | Minnie | 5 | 7 | 4.6 | ||
2 | Minnie | 6 | 8 | 3.9 | ||
3 | Daffy | 7 | 1 | 2.9 | ||
3 | Daffy | 8 | 2 | 4.9 | ||
3 | Daffy | 9 | 6 | NULL | ||
4 | Pluto | NULL | NULL | NULL | No listens |
Step 2: LEFT JOIN with Songs (Add artist)
user_id | name | song_id | title | artist | Notes | |
---|---|---|---|---|---|---|
1 | Mickey | 1 | Evermore | Taylor Swift | ||
1 | Mickey | 2 | Willow | Taylor Swift | ||
1 | Mickey | 6 | Yesterday | Beatles | ||
2 | Minnie | 2 | Willow | Taylor Swift | ||
2 | Minnie | 7 | Yellow Submarine | Beatles | ||
2 | Minnie | 8 | Hey Jude | Beatles | ||
3 | Daffy | 1 | Evermore | Taylor Swift | ||
3 | Daffy | 2 | Willow | Taylor Swift | ||
3 | Daffy | 6 | Yesterday | Beatles | ||
4 | Pluto | NULL | NULL | NULL | No songs |
Step 3: WHERE filter (artist != 'Taylor Swift' OR artist IS NULL)
user_id | name | song_id | title | artist | Notes | |
---|---|---|---|---|---|---|
1 | Mickey | 6 | Yesterday | Beatles | Kept: artist != 'Taylor Swift' | |
2 | Minnie | 7 | Yellow Submarine | Beatles | Kept: artist != 'Taylor Swift' | |
2 | Minnie | 8 | Hey Jude | Beatles | Kept: artist != 'Taylor Swift' | |
3 | Daffy | 6 | Yesterday | Beatles | Kept: artist != 'Taylor Swift' | |
4 | Pluto | NULL | NULL | NULL | Kept: artist IS NULL |
Step 4: GROUP BY and Final Output
user_id | name | |
---|---|---|
1 | Mickey | |
2 | Minnie | |
3 | Daffy | |
4 | Pluto |
Query B Execution Trace
Step 1: Find Taylor Swift listeners
user_id | Notes | |
---|---|---|
1 | Listened to songs 1, 2 (Taylor Swift) | |
2 | Listened to song 2 (Taylor Swift) | |
3 | Listened to songs 1, 2 (Taylor Swift) |
Step 2: NOT IN filter
user_id | name | Notes | |
---|---|---|---|
1 | Mickey | Excluded: user_id IN {1,2,3} | |
2 | Minnie | Excluded: user_id IN {1,2,3} | |
3 | Daffy | Excluded: user_id IN {1,2,3} | |
4 | Pluto | Included: user_id NOT IN {1,2,3} |
Final Output
user_id | name | |
---|---|---|
4 | Pluto |
The Semantic Gap
Same Question, Different Answers: English is ambiguous. SQL is not.
-
English: Users who don't listen to Taylor Swift
-
Query A: "Users who listened to non-Taylor songs" (4 users)
-
Query B: "Users who never listened to Taylor" (1 user)
Debug Queries
-
Decompose: Break query into steps
-
Trace: Follow data through each step
-
Compare: Multiple queries = spot differences
-
Edge Cases: NULL, NOT IN, LEFT JOINs, ...