Case Study 2.1: How UberEats and Google Maps Accelerate Geo Queries with RAM Hashing
The Mechanics of Speeding Up Geo Queries
Objective: Master the art of building indices on multidimensional data.
In the world of food delivery and mapping, speed isn't just a luxuryâit's a necessity. The ability to group nearby orders and assign them efficiently to drivers can make or break operational success. Imagine a list of orders with coordinates like these:
orders = [(37.788022, -122.399797), (37.788122, -122.399797), (37.788022, -122.398897)]
Geo data isn't about just finding a point on a map; it's about finding points that are close to each other in two dimensionsâlatitude and longitude. This requires more than a simple one-dimensional search key. It demands a sophisticated approach to hashing and sorting.
Geo Hashing
Geo hashing converts geographic locations into compact strings of letters and digits. Here's the breakdown:
-
Grid Division: The earth is divided into a grid of squares, each tagged with a unique geo hash.
-
Recursive Sub-division: Each square is further divided into smaller squares until the desired precision is achieved.
Example: Take (37.788022, -122.399797) with a 5-character precision (~2.8 km). The geo hash is "9q8yy". This string allows for rapid identification of nearby locations by slight variations.
Hexagon-based Hashing (Uber H3)
Enter hexagon-based hashing with H3 cellsâa more refined approach. H3 uses hexagons to partition the earth's surface, offering various resolutions (see H3 vs S2 comparison).
Why Hexagons?
Hexagons ensure uniform distance from the center to all neighboring centers, making them perfect for radius-based searches.
Uber H3 in Practice:
At zoom level 8, close locations share the same cell ID. At zoom level 12, even buildings on a campus might differ by just a few characters.
Here's how h3.geo_to_h3 converts coordinates into hexagonal cell IDs:
import h3
# lat/lngs FROM SF mission, Stanford NVidia, Packard, Gates Buildings
locations = [(37.788022, -122.399797), (37.428226, -122.174722),
(37.429749, -122.1735490), (37.429761, -122.173290)]
for zoom IN [8, 12]: # zoomlevels 8 AND 12 (imagine the +/- IN map zoom)
for l IN locations:
cell_id = h3.geo_to_h3(l[0], l[1], zoom)
print(f"@zoom[{zoom}]: {cell_id}")
Output:
@zoom[8]: 88283082a1fffff
@zoom[8]: 8828347417fffff
@zoom[8]: 8828347417fffff
@zoom[8]: 8828347417fffff
@zoom[12]: 8c283082a06bbff
@zoom[12]: 8c28347416257ff
@zoom[12]: 8c2834741602bff
@zoom[12]: 8c28347416021ff
At zoom level 8, all Stanford locations share a cell ID. At zoom level 12, Packard and Gates differ by three characters, while NVidia differs by five. SF Mission, being further, has a unique ID. Uber Eats and similar apps select zoom levels based on precision needs.
Beyond Geo Hashing
1. Perceptual Image Hashing (PhotoDNA)
Microsoft's PhotoDNA creates hashes that recognize images despite resizing or color changes. Alternatives like imagehash offer similar capabilities.
2. DNA Sequence Hashing
DNA sequence hashing uses Window Hashing, breaking sequences into overlapping substrings (k-mers), then hashing them (e.g., SHA-256) for quick partitioning and indexing.
Conclusion: By converting complex multidimensional data into hash values, we streamline data indexing and retrieval, cutting down on IO operations.