SQL's Complex Types
Good For Interviews; Syntax Optional For Tests
Core Concept
- Example: Find users within 5km, match DNA sequences, query JSON, find similar songs - all with SQL!
SQL isn't just for INT/STRING tables. It handles maps, genomes, documents, and AI.
1. JSONB or NoSQL
Don't paint yourself into a custom NoSQL DB for Wrong Reasons
Key idea:
- NoSQL became popular in the 2000s and 2010s. Popular use cases then: prototyping MVPs and for schema-less data.
- From 2010s, all modern DBs support JSON and JSONB (binary JSON) natively as another data type. This lets you combine SQL's speed with the flexibility of JSON (``NoSQL'').
// DocumentDBs OR NoSQL DBs: Great for MVPs AND real-time (e.g., Firebase, MongoDB)
// But Harder for transactions & joins
// PostgreSQL JSON: Best of both worlds
// Instead of MongoDB's documents:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
items JSONB, -- Flexible schema for prototyping where needed
created_at TIMESTAMP
);
// Get consistency + flexibility
2. Geographic Queries: Finding Things Nearby
-- Find all Spotify users within 5km of Stanford (BigQuery)
SELECT
u.name,
u.email,
-- ST_DISTANCE: Calculate distance between two geographic points in meters
ST_DISTANCE(u.location, ST_GEOGPOINT(-122.1697, 37.4275)) AS distance_meters
FROM Users u
WHERE
-- ST_DWITHIN: Check if two points are within a specified distance
ST_DWITHIN(
u.location,
ST_GEOGPOINT(-122.1697, 37.4275), -- Create a point from longitude, latitude
5000 -- Distance in meters
)
ORDER BY distance_meters;
3. Generative AI Queries: Semantic Understanding
-- Analyze song lyrics and reviews with Gemini Pro (BigQuery)
SELECT
title,
artist,
-- AI.CLASSIFY: Automatically categorize the song vibe
AI.CLASSIFY(lyrics, ["Happy", "Melancholic", "Energetic", "Chill"]) AS vibe,
-- AI.GENERATE: Use an LLM to write a marketing summary
AI.GENERATE(
"Summarize the emotional journey of these lyrics IN one sentence",
lyrics
) AS ai_summary,
-- AI.TRANSLATE: Localize the title for global markets
AI.TRANSLATE(title, "Spanish") AS title_es
FROM Songs
-- AI.IF: Semantic filtering based on natural language meaning
WHERE AI.IF("The song IS about a summer romance", lyrics)
AND play_count > 100000;
4. User-Defined Functions: Custom Logic
Create custom functions to transform your data: Add business logic directly in SQL.
Use Custom Functions
-- Score individual songs with custom functions
SELECT
s.title,
s.artist,
-- Secret algorithm for viral potential
calculateViralScore(
s.play_count,
s.skip_rate,
s.share_count
) AS viral_potential,
-- Analyze album art mood
getDominantColor(
s.album_cover_rgb
) AS mood_color,
s.play_count
FROM Songs s
WHERE s.release_year = 2024
AND s.play_count > 1000
ORDER BY viral_potential DESC
LIMIT 100;
Define Functions
-- Secret viral scoring algorithm
CREATE TEMP FUNCTION calculateViralScore(
plays INT64, skip_rate FLOAT64, shares INT64
)
RETURNS FLOAT64
AS (
plays * 0.3 + shares * 10 - skip_rate * 500
);
-- JavaScript for complex logic
CREATE TEMP FUNCTION getDominantColor(
rgb ARRAY<STRUCT<r INT64, g INT64, b INT64>>
)
RETURNS STRING
LANGUAGE js AS r"""
// Average RGB VALUES
let r = 0, g = 0, b = 0;
for (let pixel of rgb) {
r += pixel.r;
g += pixel.g;
b += pixel.b;
}
r /= rgb.length;
g /= rgb.length;
b /= rgb.length;
// Return dominant color
if (r > g && r > b) return 'Warm';
if (b > r && b > g) return 'Cool';
return 'Neutral';
""";
Key Concepts:
-
UDFs: Process individual rows with custom logic
-
UDAFs: Also possible - create custom aggregations like weighted averages
-
Languages: Use SQL for simple math, JavaScript for complex operations
-
Reusability: Define once, use everywhere in your queries
Same SQL, New Superpowers
The Pattern Stays the Same
Notice how every query above still uses the SQL you know: - SELECT columns or expressions - FROM tables (even if they contain exotic data) - WHERE conditions (using special functions) - ORDER BY to sort results
The core SQL structure never changes - we just add new functions for new data types!
Try It Yourself: BigQuery Public Datasets
-- Real queries you can run today (free tier)!
-- 1. Geographic: Find all Starbucks in San Francisco
SELECT name, address, geom
FROM `bigquery-public-data.geo_openstreetmap.planet_features`
WHERE feature_type = 'amenity'
AND tags['amenity'] = 'cafe'
AND tags['name'] LIKE '%Starbucks%'
AND ST_DWITHIN(geom, ST_GEOGPOINT(-122.4194, 37.7749), 10000);
-- 2. Genomic: Analyze COVID variants
SELECT variant_name, country, date, mutation_count
FROM `bigquery-public-data.covid19_genome_sequence.variants`
WHERE mutation_count > 10
ORDER BY date DESC;
-- 3. Weather: Temperature patterns
SELECT
name AS station_name,
MAX(temp) AS max_temp,
MIN(temp) AS min_temp,
AVG(wind) AS avg_wind
FROM `bigquery-public-data.noaa_gsod.gsod2024`
WHERE mo = '01' AND da = '01'
GROUP BY name
LIMIT 10;
The Evolution Continues
SQL Through the Decades:
1970s: Numbers and text
1990s: + Dates and binary data
2000s: + XML and spatial
2010s: + JSON and arrays
2020s: + GenAI, Vectors, and ML
2030s: + Quantum data?
Your SQL skills adapt to whatever comes next!