LLMs and SQL: Syntax and Semantics

Concept. LLM-generated SQL fails most often on NULL handling, JOIN cardinality, and the WHERE-vs-HAVING distinction — three classes of bug you can audit one clause at a time.

Intuition. An LLM might write a query against Listens that returns 4 users when only 3 exist — usually a bad JOIN on user_id or a NULL it didn't account for. Audit each clause against the schema: JOIN keys, WHERE predicates, aggregate vs. row-level filters.

Example. Same question, different logic and queries - one returns 4 users, another returns 1. Both are syntactically correct, yet semantically distinct.

Myth: "LLMs Will Replace SQL Skills"

Current Text-to-SQL models excel at basic retrieval but consistently hallucinate JOINs across complex schemas. Because you cannot feed a multi-terabyte database into an LLM context window, models routinely guess at the underlying domain logic.

The Reality: LLMs draft the boilerplate schema syntax. Human engineers use CTEs and Query Equivalence to verify and debug the semantic logic.

Real-World Performance Data

The BIRD benchmark (https://bird-bench.github.io/) evaluates LLMs on SQL generation:

  • Syntax Accuracy: Over 95%—most LLMs ace SQL syntax.

  • Semantic Accuracy: Ranges from 16% to 77%—a glaring gap.

  • The Gap: Queries that run without errors but deliver incorrect results.

Why Learn SQL? A query that executes without crashing does not mean it's correct. You must check semantics.


Generating Syntax with LLMs

Before debugging logic, ensure a syntactically correct foundation. Supply the right context:

  1. The Exact Schema Constraints: Include precise CREATE TABLE statements or schema definitions.

  2. The Relational Paths: Clearly state table joins. Don't let the LLM guess.

  3. The Execution Engine: Specify the SQL dialect—PostgreSQL, BigQuery, SQLite.

Semantic Traps to Avoid: Even with perfect context, LLMs stumble on semantics. Always check for:

  • Silent Aggregation Errors: Misused joins can inflate results.

  • NULL Handling: LLMs often mishandle NULL in subqueries or joins.

  • Domain Logic Assumptions: LLMs might misinterpret business logic. Be explicit.

An Example of a Good Prompt Template

Reduce hallucinations with a structured prompt.

Example Prompt:

[Engine] I am using PostgreSQL.

[Schema] Here are my tables: Users (user_id INT, name VARCHAR, join_date DATE) Purchases (purchase_id INT, user_id INT, amount DECIMAL, status VARCHAR) Purchases.user_id joins to Users.user_id.

[Definitions & Logic] - "Active Users": Users who joined before '2025-01-01'. - "Top Users": Users with a total purchase amount > $500 where status = 'COMPLETED'.

[The Request] Write a query to find the name and total revenue of all Active, Top Users. Exclude users without completed purchases.


Query Comparison: "Users who don't listen to Taylor Swift"

Query A: LEFT JOIN Approach

-- Find users who listened to non-Taylor songs (or no songs)
SELECT u.user_id, u.name
FROM Users u
LEFT JOIN Listens l ON u.user_id = l.user_id
LEFT JOIN Songs s ON l.song_id = s.song_id
WHERE s.artist != 'Taylor Swift' OR s.artist IS NULL
GROUP BY u.user_id, u.name

Query B: NOT IN Subquery Approach

-- Find users who never listened to any Taylor Swift song
SELECT u.user_id, u.name
FROM Users u
WHERE u.user_id NOT IN (
    SELECT DISTINCT l.user_id
    FROM Listens l
    JOIN Songs s ON l.song_id = s.song_id
    WHERE s.artist = 'Taylor Swift'
)
Reminder: Beware the NULL in NOT IN death trap! Here we're safe since l.user_id can't be NULL per our Spotify schema. Adding WHERE l.user_id IS NOT NULL is still fine defensive practice.

Step-by-Step Debug Tables

Query A Execution Trace

Step 1: LEFT JOIN Users with Listens

user_idnamelisten_idsong_idratingNotes
1Mickey114.5
1Mickey224.2
1Mickey363.9
2Minnie424.7
2Minnie574.6
2Minnie683.9
3Daffy712.9
3Daffy824.9
3Daffy96NULL
4PlutoNULLNULLNULLNo listens

Step 2: LEFT JOIN with Songs (Add artist)

user_idnamesong_idtitleartistNotes
1Mickey1EvermoreTaylor Swift
1Mickey2WillowTaylor Swift
1Mickey6YesterdayBeatles
2Minnie2WillowTaylor Swift
2Minnie7Yellow SubmarineBeatles
2Minnie8Hey JudeBeatles
3Daffy1EvermoreTaylor Swift
3Daffy2WillowTaylor Swift
3Daffy6YesterdayBeatles
4PlutoNULLNULLNULLNo songs

Step 3: WHERE filter (artist != 'Taylor Swift' OR artist IS NULL)

user_idnamesong_idtitleartistNotes
1Mickey6YesterdayBeatlesKept: artist != 'Taylor Swift'
2Minnie7Yellow SubmarineBeatlesKept: artist != 'Taylor Swift'
2Minnie8Hey JudeBeatlesKept: artist != 'Taylor Swift'
3Daffy6YesterdayBeatlesKept: artist != 'Taylor Swift'
4PlutoNULLNULLNULLKept: artist IS NULL

Step 4: GROUP BY and Final Output

user_idname
1Mickey
2Minnie
3Daffy
4Pluto

Query B Execution Trace

Step 1: Find Taylor Swift listeners

user_idNotes
1Listened to songs 1, 2 (Taylor Swift)
2Listened to song 2 (Taylor Swift)
3Listened to songs 1, 2 (Taylor Swift)

Step 2: NOT IN filter

user_idnameNotes
1MickeyExcluded: user_id IN {1,2,3}
2MinnieExcluded: user_id IN {1,2,3}
3DaffyExcluded: user_id IN {1,2,3}
4PlutoIncluded: user_id NOT IN {1,2,3}

Final Output

user_idname
4Pluto

The Semantic Gap

Same Question, Different Answers: English is ambiguous. SQL is not.

  • English: Users who don't listen to Taylor Swift

  • Query A: "Users who listened to non-Taylor songs" (4 users)

  • Query B: "Users who never listened to Taylor" (1 user)