Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Case Study 5: Ride‑sharing (Uber/Grab)

Ứng dụng gọi xe với real-time matching, tracking, pricing, payment.


Bước 1: Thu thập yêu cầu

Functional requirements

  • Ride request: Rider request ride với pickup/dropoff locations.
  • Driver matching: Find nearby available drivers.
  • Real-time tracking: Track driver location và ETA.
  • Pricing: Fare estimation, surge pricing.
  • Payment: Cashless payment, tipping.
  • Rating: Driver/rider ratings.
  • Ride history: Past trips, receipts.

Non‑functional requirements

  • Low latency: Match < 5s, real-time tracking < 1s update.
  • High availability: 99.9% uptime.
  • Scalability: Triệu concurrent rides, triệu drivers.
  • Consistency: Pricing và payment phải consistent.
  • Location accuracy: < 10m error.

Scale estimation

  • Cities: 100 cities worldwide.
  • Drivers: 5 triệu active drivers.
  • Riders: 50 triệu active riders.
  • Rides per day: 20 triệu.
  • Location updates: 50 triệu/minute (drivers sending GPS).

Bước 2: Ước lượng

Traffic estimates

  • Ride requests: 20M / 86400 ≈ 230 RPS (average).
  • Location updates: 50M / 60s ≈ 830,000 RPS.
  • Match requests: ~230 RPS (one per ride request).
  • Peak RPS: ~5x average → 4 triệu location updates/s.

Storage estimates

  • Ride data: 20M/day * 1 KB ≈ 20 GB/day36 TB (5 years).
  • Location history: 50M/min * 100 bytes * 60 * 24 ≈ 7 TB/day.
  • User data: 55M users * 1 KB ≈ 55 GB.
  • Total 5 years: ~15 PB (chủ yếu location history).

Bandwidth estimates

  • Location upload: 830k RPS * 100 bytes ≈ 83 MB/s.
  • Tracking download: Fan-out to riders ≈ 500 MB/s.

Bước 3: Thiết kế High‑Level

Components chính

┌──────────┐     ┌─────────────┐     ┌──────────────┐
│  Client  │ ──→ │ Load        │ ──→ │ API Gateway  │
│ (Mobile) │     │ Balancer    │     │              │
└──────────┘     └─────────────┘     └──────────────┘
                                            │
        ┌───────────────────────────────────┼───────────────────────────────────┐
        ▼                                   ▼                                   ▼
┌───────────────┐                  ┌───────────────┐                  ┌───────────────┐
│   Ride        │                  │   Location    │                  │    Dispatch   │
│   Service     │                  │   Service     │                  │    Service    │
└───────────────┘                  └───────────────┘                  └───────────────┘
        │                                   │                                   │
        ▼                                   ▼                                   ▼
┌───────────────┐                  ┌───────────────┐                  ┌───────────────┐
│  PostgreSQL   │                  │   Redis +     │                  │    Kafka      │
│   (Rides)     │                  │  GeoIndex     │                  │   (Events)    │
└───────────────┘                  └───────────────┘                  └───────────────┘
        │
        ▼
┌───────────────┐
│   Payment     │
│   Service     │
└───────────────┘

Technology selection

  • Mobile: iOS (Swift), Android (Kotlin) với real-time updates.
  • API Gateway: Kong với WebSocket support.
  • Location Service: Redis GeoHash hoặc Elasticsearch geo queries.
  • Database: PostgreSQL cho rides, users; Cassandra cho location history.
  • Cache: Redis cho driver locations, availability.
  • Message Queue: Kafka cho location streams, ride events.
  • Real-time: WebSocket cho driver tracking.

Bước 4: Thiết kế Chi tiết

Database Schema

Table: rides

ColumnTypeDescription
ride_idUUIDPrimary key
rider_idBIGINTForeign key
driver_idBIGINTForeign key (nullable khi chưa match)
pickup_latDECIMAL(9,6)Pickup location
pickup_lngDECIMAL(9,6)Pickup location
dropoff_latDECIMAL(9,6)Dropoff location
dropoff_lngDECIMAL(9,6)Dropoff location
statusTINYINTRequested/Matched/InProgress/Completed/Cancelled
fareDECIMAL(10,2)Final fare
created_atTIMESTAMPRequest time

Table: drivers

ColumnTypeDescription
driver_idBIGINTPrimary key
user_idBIGINTForeign key
vehicle_infoJSONCar model, plate, color
ratingDECIMAL(3,2)Average rating
statusTINYINTAvailable/Busy/Offline

Table: locations (Time-series, partitioned)

ColumnTypeDescription
driver_idBIGINTPartition key
timestampTIMESTAMPClustering key
latDECIMAL(9,6)GPS latitude
lngDECIMAL(9,6)GPS longitude
Partition by: DATE(timestamp)

Geospatial Indexing

GeoHash: Encode (lat, lng) thành string.

  • Precision 6: ~1.2km x 600m.
  • Precision 8: ~38m x 19m.
# Redis Geo commands
GEOADD drivers:online -122.4194 37.7749 driver_123
GEORADIUS drivers:online -122.4194 37.7749 5 km COUNT 10

API Design

POST /api/v1/rides/request
{
  "pickup": { "lat": 37.7749, "lng": -122.4194 },
  "dropoff": { "lat": 37.7849, "lng": -122.4094 },
  "ride_type": "uberx"
}

Response:
{
  "ride_id": "uuid",
  "estimated_fare": 15.50,
  "eta": 300
}

POST /api/v1/drivers/location
{
  "lat": 37.7749,
  "lng": -122.4194,
  "heading": 180
}

GET /api/v1/rides/{ride_id}/tracking

Response:
{
  "driver": { "name": "John", "vehicle": "Toyota Camry" },
  "location": { "lat": 37.7750, "lng": -122.4195 },
  "eta": 180
}

Data Flow

Request Ride:

  1. Rider POST /rides/request.
  2. Ride Service create ride (REQUESTED).
  3. Calculate fare (base + distance + time + surge).
  4. Dispatch Service tìm nearby drivers (5km radius).
  5. Filter available drivers, rank by distance/rating.
  6. Send ride request to top drivers (via push notification).
  7. First driver accepts → match.
  8. Update ride status (MATCHED), notify rider.

Driver Location Update:

  1. Driver app gửi location mỗi 3s qua WebSocket.
  2. Location Service receive, validate.
  3. Update Redis GeoIndex: drivers:online.
  4. Stream to Kafka topic driver-locations.
  5. LocationHistory Service consume, store in Cassandra.
  6. Real-time update to rider tracking (if active ride).

Matching Algorithm:

function findBestDriver(rideRequest):
    nearbyDrivers = geoSearch(rideRequest.pickup, radius=5km)
    availableDrivers = filter(nearbyDrivers, status=AVAILABLE)
    
    scoredDrivers = map(availableDrivers, driver => {
        distance = calculateDistance(driver, pickup)
        eta = calculateETA(driver, pickup)
        rating = driver.rating
        score = (rating * 0.4) - (eta * 0.6)
        return { driver, score }
    })
    
    return sortBy(scoredDrivers, score).first

Surge Pricing:

function calculateSurgeMultiplier(city, timestamp):
    demand = getRideRequestsLast10Min(city)
    supply = getAvailableDrivers(city)
    ratio = demand / supply
    
    if ratio > 2.0: return 2.0
    if ratio > 1.5: return 1.5
    if ratio > 1.2: return 1.2
    return 1.0

Bước 5: Bottlenecks & Tối ưu

Single Point of Failure

  • Location Service: Multiple instances với consistent hashing.
  • Database: PostgreSQL với replication, failover.
  • Dispatch: Stateless, horizontal scaling.

Scalability Bottlenecks

  • Location writes: 830k RPS → Shard theo driver_id hoặc city.
  • Geo queries: Redis cluster với geo-sharding theo city.
  • Matching: Parallelize driver search, timeout sau 2s.

Performance Optimization

  • Caching:
    • Driver locations in Redis (TTL 30s).
    • Fare estimates cached cho common routes.
  • Batching: Location updates batched mỗi 3-5s.
  • Async processing:
    • Payment processing async.
    • Rating updates async.
  • Connection pooling: Database connection pools.

Edge Cases

  • Driver offline during ride: Re-match rider với driver khác.
  • No drivers available: Queue ride, notify rider.
  • GPS inaccuracy: Snap to road, use last known good location.
  • Payment failure: Retry logic, fallback to cash.

Bước 6: Trade‑offs

Consistency vs Availability

  • CP system cho pricing và payment: Strong consistency.
  • AP system cho location tracking: Eventual consistency (delay vài giây acceptable).

Latency vs Accuracy

  • Low latency matching: Match với driver gần nhất trong 2s.
  • Better match: Wait 5-10s để tìm driver tốt hơn → trade-off.

Precision vs Cost

  • High precision (GeoHash 8): ~20m accuracy, nhiều data.
  • Lower precision (GeoHash 6): ~1km accuracy, ít data.
  • Solution: Dynamic precision based on density.

Real-time vs Batch

  • Real-time: Location streaming, instant matching.
  • Batch: Analytics, surge pricing calculation, driver incentives.

Kết luận

Ride-sharing system là distributed system phức tạp với challenges:

  • Real-time geospatial processing: Triệu location updates/giây.
  • Matching algorithm: Balance giữa latency và quality.
  • Dynamic pricing: Supply/demand balancing.
  • High availability: Critical cho driver/rider experience.
  • Payment integrity: Consistent, secure transactions.

← E‑commerce Platform | Xem tiếp: Video Streaming →