SQL's Complex Types

Good For Interviews; Syntax Optional For Tests

Core Concept

SQL isn't just for INT/STRING tables. It handles maps, genomes, documents, and AI.

SQL Powers Everything: From Maps to DNA to AI Geographic Data Stanford Mickey Minnie Daffy ST_DWITHIN(location, point, 5000) Genomic Data Reference: ACTGACTGACTG Sample 1: ACTCACTGACTG 1 mutation Sample 2: ACTGACTCACTC 2 mutations Sample 3: ACTGACTGACTG Perfect match = ACTG EDIT_DISTANCE(seq1, seq2) < 3 Special Data Types Images Time Series [1,2,3,4] Arrays {...} JSON Documents • UUID: a1b2c3d4-e5f6-... • Intervals: [2024-01-01, 2024-12-31] • Binary: 0x4D5A9000... UNNEST(array) • IMAGE_SIZE(blob) Generative AI (LLMs) PROMPT: "Explain these lyrics..." GEMINI RESPONSE: "This song depicts a journey of resilience..." "Categorized as: 'Inspirational'" AI.GENERATE(prompt, lyrics) All Still SQL! SELECT ... FROM ... WHERE ... ORDER BY ...

1. JSONB or NoSQL

Don't paint yourself into a custom NoSQL DB for Wrong Reasons

Key idea:

  1. NoSQL became popular in the 2000s and 2010s. Popular use cases then: prototyping MVPs and for schema-less data.
  2. From 2010s, all modern DBs support JSON and JSONB (binary JSON) natively as another data type. This lets you combine SQL's speed with the flexibility of JSON (``NoSQL'').
// DocumentDBs OR NoSQL DBs: Great for MVPs AND real-time (e.g., Firebase, MongoDB)
// But Harder for transactions & joins
// PostgreSQL JSON: Best of both worlds

// Instead of MongoDB's documents:
CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  items JSONB,  -- Flexible schema for prototyping where needed
  created_at TIMESTAMP
);
// Get consistency + flexibility

2. Geographic Queries: Finding Things Nearby

-- Find all Spotify users within 5km of Stanford (BigQuery)
SELECT 
  u.name,
  u.email,
  -- ST_DISTANCE: Calculate distance between two geographic points in meters
  ST_DISTANCE(u.location, ST_GEOGPOINT(-122.1697, 37.4275)) AS distance_meters
FROM Users u
WHERE 
  -- ST_DWITHIN: Check if two points are within a specified distance
  ST_DWITHIN(
    u.location, 
    ST_GEOGPOINT(-122.1697, 37.4275),  -- Create a point from longitude, latitude
    5000  -- Distance in meters
  )
ORDER BY distance_meters;

3. Generative AI Queries: Semantic Understanding

-- Analyze song lyrics and reviews with Gemini Pro (BigQuery)
SELECT 
  title,
  artist,
  -- AI.CLASSIFY: Automatically categorize the song vibe
  AI.CLASSIFY(lyrics, ["Happy", "Melancholic", "Energetic", "Chill"]) AS vibe,

  -- AI.GENERATE: Use an LLM to write a marketing summary
  AI.GENERATE(
    "Summarize the emotional journey of these lyrics IN one sentence", 
    lyrics
  ) AS ai_summary,

  -- AI.TRANSLATE: Localize the title for global markets
  AI.TRANSLATE(title, "Spanish") AS title_es
FROM Songs
-- AI.IF: Semantic filtering based on natural language meaning
WHERE AI.IF("The song IS about a summer romance", lyrics)
  AND play_count > 100000;

4. User-Defined Functions: Custom Logic

Create custom functions to transform your data: Add business logic directly in SQL.

Use Custom Functions

-- Score individual songs with custom functions
SELECT 
  s.title,
  s.artist,

  -- Secret algorithm for viral potential
  calculateViralScore(
    s.play_count, 
    s.skip_rate,
    s.share_count
  ) AS viral_potential,

  -- Analyze album art mood  
  getDominantColor(
    s.album_cover_rgb
  ) AS mood_color,

  s.play_count
FROM Songs s
WHERE s.release_year = 2024
  AND s.play_count > 1000
ORDER BY viral_potential DESC
LIMIT 100;

Define Functions

-- Secret viral scoring algorithm
CREATE TEMP FUNCTION calculateViralScore(
  plays INT64, skip_rate FLOAT64, shares INT64
)
RETURNS FLOAT64
AS (
  plays * 0.3 + shares * 10 - skip_rate * 500
);

-- JavaScript for complex logic
CREATE TEMP FUNCTION getDominantColor(
  rgb ARRAY<STRUCT<r INT64, g INT64, b INT64>>
)
RETURNS STRING
LANGUAGE js AS r"""
  // Average RGB VALUES
  let r = 0, g = 0, b = 0;
  for (let pixel of rgb) {
    r += pixel.r;
    g += pixel.g; 
    b += pixel.b;
  }
  r /= rgb.length;
  g /= rgb.length;
  b /= rgb.length;

  // Return dominant color
  if (r > g && r > b) return 'Warm';
  if (b > r && b > g) return 'Cool';
  return 'Neutral';
""";

Key Concepts:


Same SQL, New Superpowers

The Pattern Stays the Same

Notice how every query above still uses the SQL you know: - SELECT columns or expressions - FROM tables (even if they contain exotic data) - WHERE conditions (using special functions) - ORDER BY to sort results

The core SQL structure never changes - we just add new functions for new data types!


Try It Yourself: BigQuery Public Datasets

-- Real queries you can run today (free tier)!

-- 1. Geographic: Find all Starbucks in San Francisco
SELECT name, address, geom
FROM `bigquery-public-data.geo_openstreetmap.planet_features`
WHERE feature_type = 'amenity' 
  AND tags['amenity'] = 'cafe'
  AND tags['name'] LIKE '%Starbucks%'
  AND ST_DWITHIN(geom, ST_GEOGPOINT(-122.4194, 37.7749), 10000);

-- 2. Genomic: Analyze COVID variants
SELECT variant_name, country, date, mutation_count
FROM `bigquery-public-data.covid19_genome_sequence.variants`
WHERE mutation_count > 10
ORDER BY date DESC;

-- 3. Weather: Temperature patterns
SELECT 
  name AS station_name,
  MAX(temp) AS max_temp,
  MIN(temp) AS min_temp,
  AVG(wind) AS avg_wind
FROM `bigquery-public-data.noaa_gsod.gsod2024`
WHERE mo = '01' AND da = '01'
GROUP BY name
LIMIT 10;

The Evolution Continues

SQL Through the Decades:
1970s: Numbers and text
1990s: + Dates and binary data
2000s: + XML and spatial
2010s: + JSON and arrays  
2020s: + GenAI, Vectors, and ML
2030s: + Quantum data?

Your SQL skills adapt to whatever comes next!