Case Study 2.1: How UberEats and Google Maps Accelerate Geo Queries with RAM Hashing

Case Study 2.1 Reading Time: 6 mins

The Mechanics of Speeding Up Geo Queries

Objective: Master the art of building indices on multidimensional data.

In the world of food delivery and mapping, speed isn't just a luxury—it's a necessity. The ability to group nearby orders and assign them efficiently to drivers can make or break operational success. Imagine a list of orders with coordinates like these: orders = [(37.788022, -122.399797), (37.788122, -122.399797), (37.788022, -122.398897)]

Geo data isn't about just finding a point on a map; it's about finding points that are close to each other in two dimensions—latitude and longitude. This requires more than a simple one-dimensional search key. It demands a sophisticated approach to hashing and sorting.


Geo Hashing

Geo hashing converts geographic locations into compact strings of letters and digits. Here's the breakdown:

  1. Grid Division: The earth is divided into a grid of squares, each tagged with a unique geo hash.

  2. Recursive Sub-division: Each square is further divided into smaller squares until the desired precision is achieved.

Example: Take (37.788022, -122.399797) with a 5-character precision (~2.8 km). The geo hash is "9q8yy". This string allows for rapid identification of nearby locations by slight variations.


Hexagon-based Hashing (Uber H3)

Enter hexagon-based hashing with H3 cells—a more refined approach. H3 uses hexagons to partition the earth's surface, offering various resolutions (see H3 vs S2 comparison).

Why Hexagons?

Hexagons ensure uniform distance from the center to all neighboring centers, making them perfect for radius-based searches.

Uber H3 in Practice:

At zoom level 8, close locations share the same cell ID. At zoom level 12, even buildings on a campus might differ by just a few characters.

Map of SF and Stanford locations

Click to expand: Map of SF and Stanford locations

Here's how h3.geo_to_h3 converts coordinates into hexagonal cell IDs:

import h3
# lat/lngs FROM SF mission, Stanford NVidia, Packard, Gates Buildings
locations = [(37.788022, -122.399797), (37.428226, -122.174722),
             (37.429749, -122.1735490), (37.429761, -122.173290)]

for zoom IN [8, 12]: # zoomlevels 8 AND 12 (imagine the +/- IN map zoom)
    for l IN locations:
        cell_id = h3.geo_to_h3(l[0], l[1], zoom)
        print(f"@zoom[{zoom}]: {cell_id}")

Output:

@zoom[8]: 88283082a1fffff
@zoom[8]: 8828347417fffff
@zoom[8]: 8828347417fffff
@zoom[8]: 8828347417fffff
@zoom[12]: 8c283082a06bbff
@zoom[12]: 8c28347416257ff
@zoom[12]: 8c2834741602bff
@zoom[12]: 8c28347416021ff

At zoom level 8, all Stanford locations share a cell ID. At zoom level 12, Packard and Gates differ by three characters, while NVidia differs by five. SF Mission, being further, has a unique ID. Uber Eats and similar apps select zoom levels based on precision needs.


Beyond Geo Hashing

1. Perceptual Image Hashing (PhotoDNA)

Microsoft's PhotoDNA creates hashes that recognize images despite resizing or color changes. Alternatives like imagehash offer similar capabilities.

2. DNA Sequence Hashing

DNA sequence hashing uses Window Hashing, breaking sequences into overlapping substrings (k-mers), then hashing them (e.g., SHA-256) for quick partitioning and indexing.

Conclusion: By converting complex multidimensional data into hash values, we streamline data indexing and retrieval, cutting down on IO operations.