6.5 · SQL's Complex Types

Advanced Data Types

Before you jump into the deep end with a specialized new database, consider extending what you already have. Modern PostgreSQL is so versatile, with native support for JSONB, AI Vectors, and Geo-Spatial coordinates, that it effectively cannibalized dozens of specialized Vector database startups in the 2024–2025 AI boom.

Good For Interviews; Syntax Optional For Tests

Core Concept

  • Example: Use SQL to find users within 5km, match DNA sequences, query JSON, or find similar songs.

SQL isn't just for your basic INT or STRING tables. It's capable of handling maps, genomes, documents, and even AI.

SQL Powers Everything: From Maps to DNA to AI Geographic Data Stanford Mickey Minnie Daffy ST_DWITHIN(location, point, 5000) Genomic Data Reference: ACTGACTGACTG Sample 1: ACTCACTGACTG 1 mutation Sample 2: ACTGACTCACTC 2 mutations Sample 3: ACTGACTGACTG Perfect match = ACTG EDIT_DISTANCE(seq1, seq2) < 3 Special Data Types Images Time Series [1,2,3,4] Arrays {...} JSON Documents • UUID: a1b2c3d4-e5f6-... • Intervals: [2024-01-01, 2024-12-31] • Binary: 0x4D5A9000... UNNEST(array) • IMAGE_SIZE(blob) Generative AI (LLMs) PROMPT: "Explain these lyrics..." GEMINI RESPONSE: "This song depicts a journey of resilience..." "Categorized as: 'Inspirational'" AI.GENERATE(prompt, lyrics) All Still SQL! SELECT ... FROM ... WHERE ... ORDER BY ...

1. JSONB or NoSQL

Avoid the NoSQL Trap for the Wrong Reasons

Key idea:

  1. Your startup doesn't need a sprawling NoSQL database just because there's some JSON involved.
  2. One PostgreSQL node with JSONB can handle your MVP efficiently, providing ACID compliance long before NoSQL becomes necessary. Modern SQL offers the speed and reliability of relational data while natively supporting JSON's flexibility.
// DocumentDBs OR NoSQL DBs: Great for MVPs AND real-time (e.g., Firebase, MongoDB)
// But Harder for transactions & joins
// PostgreSQL JSON: Best of both worlds

// Instead of MongoDB's documents:
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  items JSONB,  -- Flexible schema for prototyping where needed
  created_at TIMESTAMP
);
// Get consistency + flexibility

2. Geographic Queries: Finding Things Nearby

-- Find all Spotify users within 5km of Stanford (BigQuery)
SELECT 
  u.name,
  u.email,
  -- ST_DISTANCE: Calculate distance between two geographic points in meters
  ST_DISTANCE(u.location, ST_GEOGPOINT(-122.1697, 37.4275)) AS distance_meters
FROM Users u
WHERE 
  -- ST_DWITHIN: Check if two points are within a specified distance
  ST_DWITHIN(
    u.location, 
    ST_GEOGPOINT(-122.1697, 37.4275),  -- Create a point from longitude, latitude
    5000  -- Distance in meters
  )
ORDER BY distance_meters;

3. Generative AI Queries: Semantic Understanding

-- Analyze song lyrics and reviews with Gemini Pro (BigQuery)
SELECT 
  title,
  artist,
  -- AI.CLASSIFY: Automatically categorize the song vibe
  AI.CLASSIFY(lyrics, ["Happy", "Melancholic", "Energetic", "Chill"]) AS vibe,

  -- AI.GENERATE: Use an LLM to write a marketing summary
  AI.GENERATE(
    "Summarize the emotional journey of these lyrics IN one sentence", 
    lyrics
  ) AS ai_summary,

  -- AI.TRANSLATE: Localize the title for global markets
  AI.TRANSLATE(title, "Spanish") AS title_es
FROM Songs
-- AI.IF: Semantic filtering based on natural language meaning
WHERE AI.IF("The song IS about a summer romance", lyrics)
  AND play_count > 100000;

4. User-Defined Functions: Custom Logic

Embed custom functions to tailor your data transformations: Implement business logic directly in SQL.

Define Once

-- SQL UDF: simple, fast, inlined
CREATE TEMP FUNCTION viralScore(
  plays  INT64,
  shares INT64
)
RETURNS FLOAT64
AS (plays * 0.3 + shares * 10);


-- JavaScript UDF: when SQL alone
-- is not expressive enough
CREATE TEMP FUNCTION normalizeTitle(
  t STRING
)
RETURNS STRING
LANGUAGE js AS """
  return t.toLowerCase().trim();
""";

Use Anywhere

SELECT
  normalizeTitle(s.title) AS title,
  s.artist,
  viralScore(
    s.play_count,
    s.share_count
  ) AS viral_potential
FROM Songs s
WHERE s.release_year = 2024
  AND s.play_count > 1000
ORDER BY viral_potential DESC
LIMIT 100;

Key Concepts:

  • UDFs: Apply custom logic to individual rows

  • UDAFs: Create custom aggregations like weighted averages

  • Languages: Use SQL for simple tasks, JavaScript for complex operations

  • Reusability: Define once, apply across your queries


Same SQL, New Superpowers

The Pattern Stays the Same

The core SQL structure remains unchanged: - SELECT columns or expressions - FROM tables (even those with complex data) - WHERE conditions (leveraging special functions) - ORDER BY to sort results

We just introduce new functions to handle new data types.


Try It Yourself: BigQuery Public Datasets

-- Real queries you can run today (free tier)!

-- 1. Geographic: Find all Starbucks in San Francisco
SELECT name, address, geom
FROM `bigquery-public-data.geo_openstreetmap.planet_features`
WHERE feature_type = 'amenity' 
  AND tags['amenity'] = 'cafe'
  AND tags['name'] LIKE '%Starbucks%'
  AND ST_DWITHIN(geom, ST_GEOGPOINT(-122.4194, 37.7749), 10000);

-- 2. Genomic: Analyze COVID variants
SELECT variant_name, country, date, mutation_count
FROM `bigquery-public-data.covid19_genome_sequence.variants`
WHERE mutation_count > 10
ORDER BY date DESC;

-- 3. Weather: Temperature patterns
SELECT 
  name AS station_name,
  MAX(temp) AS max_temp,
  MIN(temp) AS min_temp,
  AVG(wind) AS avg_wind
FROM `bigquery-public-data.noaa_gsod.gsod2024`
WHERE mo = '01' AND da = '01'
GROUP BY name
LIMIT 10;

The Evolution Continues

SQL Through the Decades:
1970s: Numbers and text
1990s: + Dates and binary data
2000s: + XML and spatial
2010s: + JSON and arrays  
2020s: + GenAI, Vectors, and ML
2030s: + Quantum data?

Your SQL skills adapt to whatever comes next!