SQL's Complex Types
Advanced Data Types
Before you jump into the deep end with a new system like Vector or Spatial databases, consider extending what you already have. PostgreSQL is more versatile than you think, with native support for JSONB, AI Vectors, and Geo-Spatial coordinates.
Good For Interviews; Syntax Optional For Tests
Core Concept
- Example: Use SQL to find users within 5km, match DNA sequences, query JSON, or find similar songs.
SQL isn't just for your basic INT or STRING tables. It's capable of handling maps, genomes, documents, and even AI.
1. JSONB or NoSQL
Avoid the NoSQL Trap for the Wrong Reasons
Key idea:
- Your startup doesn't need a sprawling NoSQL database just because there's some JSON involved.
- One PostgreSQL node with JSONB can handle your MVP efficiently, providing ACID compliance long before NoSQL becomes necessary. Modern SQL offers the speed and reliability of relational data while natively supporting JSON's flexibility.
// DocumentDBs OR NoSQL DBs: Great for MVPs AND real-time (e.g., Firebase, MongoDB)
// But Harder for transactions & joins
// PostgreSQL JSON: Best of both worlds
// Instead of MongoDB's documents:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
items JSONB, -- Flexible schema for prototyping where needed
created_at TIMESTAMP
);
// Get consistency + flexibility
2. Geographic Queries: Finding Things Nearby
-- Find all Spotify users within 5km of Stanford (BigQuery)
SELECT
u.name,
u.email,
-- ST_DISTANCE: Calculate distance between two geographic points in meters
ST_DISTANCE(u.location, ST_GEOGPOINT(-122.1697, 37.4275)) AS distance_meters
FROM Users u
WHERE
-- ST_DWITHIN: Check if two points are within a specified distance
ST_DWITHIN(
u.location,
ST_GEOGPOINT(-122.1697, 37.4275), -- Create a point from longitude, latitude
5000 -- Distance in meters
)
ORDER BY distance_meters;
3. Generative AI Queries: Semantic Understanding
-- Analyze song lyrics and reviews with Gemini Pro (BigQuery)
SELECT
title,
artist,
-- AI.CLASSIFY: Automatically categorize the song vibe
AI.CLASSIFY(lyrics, ["Happy", "Melancholic", "Energetic", "Chill"]) AS vibe,
-- AI.GENERATE: Use an LLM to write a marketing summary
AI.GENERATE(
"Summarize the emotional journey of these lyrics IN one sentence",
lyrics
) AS ai_summary,
-- AI.TRANSLATE: Localize the title for global markets
AI.TRANSLATE(title, "Spanish") AS title_es
FROM Songs
-- AI.IF: Semantic filtering based on natural language meaning
WHERE AI.IF("The song IS about a summer romance", lyrics)
AND play_count > 100000;
4. User-Defined Functions: Custom Logic
Embed custom functions to tailor your data transformations: Implement business logic directly in SQL.
Use Custom Functions
-- Score individual songs with custom functions
SELECT
s.title,
s.artist,
-- Secret algorithm for viral potential
calculateViralScore(
s.play_count,
s.skip_rate,
s.share_count
) AS viral_potential,
-- Analyze album art mood
getDominantColor(
s.album_cover_rgb
) AS mood_color,
s.play_count
FROM Songs s
WHERE s.release_year = 2024
AND s.play_count > 1000
ORDER BY viral_potential DESC
LIMIT 100;
Define Functions
-- Secret viral scoring algorithm
CREATE TEMP FUNCTION calculateViralScore(
plays INT64, skip_rate FLOAT64, shares INT64
)
RETURNS FLOAT64
AS (
plays * 0.3 + shares * 10 - skip_rate * 500
);
-- JavaScript for complex logic
CREATE TEMP FUNCTION getDominantColor(
rgb ARRAY<STRUCT<r INT64, g INT64, b INT64>>
)
RETURNS STRING
LANGUAGE js AS r"""
// Average RGB VALUES
let r = 0, g = 0, b = 0;
for (let pixel of rgb) {
r += pixel.r;
g += pixel.g;
b += pixel.b;
}
r /= rgb.length;
g /= rgb.length;
b /= rgb.length;
// Return dominant color
if (r > g && r > b) return 'Warm';
if (b > r && b > g) return 'Cool';
return 'Neutral';
""";
Key Concepts:
-
UDFs: Apply custom logic to individual rows
-
UDAFs: Create custom aggregations like weighted averages
-
Languages: Use SQL for simple tasks, JavaScript for complex operations
-
Reusability: Define once, apply across your queries
Same SQL, New Superpowers
The Pattern Stays the Same
The core SQL structure remains unchanged: - SELECT columns or expressions - FROM tables (even those with complex data) - WHERE conditions (leveraging special functions) - ORDER BY to sort results
We just introduce new functions to handle new data types.
Try It Yourself: BigQuery Public Datasets
-- Real queries you can run today (free tier)!
-- 1. Geographic: Find all Starbucks in San Francisco
SELECT name, address, geom
FROM `bigquery-public-data.geo_openstreetmap.planet_features`
WHERE feature_type = 'amenity'
AND tags['amenity'] = 'cafe'
AND tags['name'] LIKE '%Starbucks%'
AND ST_DWITHIN(geom, ST_GEOGPOINT(-122.4194, 37.7749), 10000);
-- 2. Genomic: Analyze COVID variants
SELECT variant_name, country, date, mutation_count
FROM `bigquery-public-data.covid19_genome_sequence.variants`
WHERE mutation_count > 10
ORDER BY date DESC;
-- 3. Weather: Temperature patterns
SELECT
name AS station_name,
MAX(temp) AS max_temp,
MIN(temp) AS min_temp,
AVG(wind) AS avg_wind
FROM `bigquery-public-data.noaa_gsod.gsod2024`
WHERE mo = '01' AND da = '01'
GROUP BY name
LIMIT 10;
The Evolution Continues
SQL Through the Decades:
1970s: Numbers and text
1990s: + Dates and binary data
2000s: + XML and spatial
2010s: + JSON and arrays
2020s: + GenAI, Vectors, and ML
2030s: + Quantum data?
Your SQL skills adapt to whatever comes next!