Case Study 5: Ride‑sharing (Uber/Grab)
Ứng dụng gọi xe với real-time matching, tracking, pricing, payment.
Bước 1: Thu thập yêu cầu
Functional requirements
- Ride request: Rider request ride với pickup/dropoff locations.
- Driver matching: Find nearby available drivers.
- Real-time tracking: Track driver location và ETA.
- Pricing: Fare estimation, surge pricing.
- Payment: Cashless payment, tipping.
- Rating: Driver/rider ratings.
- Ride history: Past trips, receipts.
Non‑functional requirements
- Low latency: Match < 5s, real-time tracking < 1s update.
- High availability: 99.9% uptime.
- Scalability: Triệu concurrent rides, triệu drivers.
- Consistency: Pricing và payment phải consistent.
- Location accuracy: < 10m error.
Scale estimation
- Cities: 100 cities worldwide.
- Drivers: 5 triệu active drivers.
- Riders: 50 triệu active riders.
- Rides per day: 20 triệu.
- Location updates: 50 triệu/minute (drivers sending GPS).
Bước 2: Ước lượng
Traffic estimates
- Ride requests: 20M / 86400 ≈ 230 RPS (average).
- Location updates: 50M / 60s ≈ 830,000 RPS.
- Match requests: ~230 RPS (one per ride request).
- Peak RPS: ~5x average → 4 triệu location updates/s.
Storage estimates
- Ride data: 20M/day * 1 KB ≈ 20 GB/day → 36 TB (5 years).
- Location history: 50M/min * 100 bytes * 60 * 24 ≈ 7 TB/day.
- User data: 55M users * 1 KB ≈ 55 GB.
- Total 5 years: ~15 PB (chủ yếu location history).
Bandwidth estimates
- Location upload: 830k RPS * 100 bytes ≈ 83 MB/s.
- Tracking download: Fan-out to riders ≈ 500 MB/s.
Bước 3: Thiết kế High‑Level
Components chính
┌──────────┐ ┌─────────────┐ ┌──────────────┐
│ Client │ ──→ │ Load │ ──→ │ API Gateway │
│ (Mobile) │ │ Balancer │ │ │
└──────────┘ └─────────────┘ └──────────────┘
│
┌───────────────────────────────────┼───────────────────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Ride │ │ Location │ │ Dispatch │
│ Service │ │ Service │ │ Service │
└───────────────┘ └───────────────┘ └───────────────┘
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ PostgreSQL │ │ Redis + │ │ Kafka │
│ (Rides) │ │ GeoIndex │ │ (Events) │
└───────────────┘ └───────────────┘ └───────────────┘
│
▼
┌───────────────┐
│ Payment │
│ Service │
└───────────────┘
Technology selection
- Mobile: iOS (Swift), Android (Kotlin) với real-time updates.
- API Gateway: Kong với WebSocket support.
- Location Service: Redis GeoHash hoặc Elasticsearch geo queries.
- Database: PostgreSQL cho rides, users; Cassandra cho location history.
- Cache: Redis cho driver locations, availability.
- Message Queue: Kafka cho location streams, ride events.
- Real-time: WebSocket cho driver tracking.
Bước 4: Thiết kế Chi tiết
Database Schema
Table: rides
| Column | Type | Description |
|---|---|---|
| ride_id | UUID | Primary key |
| rider_id | BIGINT | Foreign key |
| driver_id | BIGINT | Foreign key (nullable khi chưa match) |
| pickup_lat | DECIMAL(9,6) | Pickup location |
| pickup_lng | DECIMAL(9,6) | Pickup location |
| dropoff_lat | DECIMAL(9,6) | Dropoff location |
| dropoff_lng | DECIMAL(9,6) | Dropoff location |
| status | TINYINT | Requested/Matched/InProgress/Completed/Cancelled |
| fare | DECIMAL(10,2) | Final fare |
| created_at | TIMESTAMP | Request time |
Table: drivers
| Column | Type | Description |
|---|---|---|
| driver_id | BIGINT | Primary key |
| user_id | BIGINT | Foreign key |
| vehicle_info | JSON | Car model, plate, color |
| rating | DECIMAL(3,2) | Average rating |
| status | TINYINT | Available/Busy/Offline |
Table: locations (Time-series, partitioned)
| Column | Type | Description |
|---|---|---|
| driver_id | BIGINT | Partition key |
| timestamp | TIMESTAMP | Clustering key |
| lat | DECIMAL(9,6) | GPS latitude |
| lng | DECIMAL(9,6) | GPS longitude |
| Partition by: DATE(timestamp) |
Geospatial Indexing
GeoHash: Encode (lat, lng) thành string.
- Precision 6: ~1.2km x 600m.
- Precision 8: ~38m x 19m.
# Redis Geo commands
GEOADD drivers:online -122.4194 37.7749 driver_123
GEORADIUS drivers:online -122.4194 37.7749 5 km COUNT 10
API Design
POST /api/v1/rides/request
{
"pickup": { "lat": 37.7749, "lng": -122.4194 },
"dropoff": { "lat": 37.7849, "lng": -122.4094 },
"ride_type": "uberx"
}
Response:
{
"ride_id": "uuid",
"estimated_fare": 15.50,
"eta": 300
}
POST /api/v1/drivers/location
{
"lat": 37.7749,
"lng": -122.4194,
"heading": 180
}
GET /api/v1/rides/{ride_id}/tracking
Response:
{
"driver": { "name": "John", "vehicle": "Toyota Camry" },
"location": { "lat": 37.7750, "lng": -122.4195 },
"eta": 180
}
Data Flow
Request Ride:
- Rider POST /rides/request.
- Ride Service create ride (REQUESTED).
- Calculate fare (base + distance + time + surge).
- Dispatch Service tìm nearby drivers (5km radius).
- Filter available drivers, rank by distance/rating.
- Send ride request to top drivers (via push notification).
- First driver accepts → match.
- Update ride status (MATCHED), notify rider.
Driver Location Update:
- Driver app gửi location mỗi 3s qua WebSocket.
- Location Service receive, validate.
- Update Redis GeoIndex:
drivers:online. - Stream to Kafka topic
driver-locations. - LocationHistory Service consume, store in Cassandra.
- Real-time update to rider tracking (if active ride).
Matching Algorithm:
function findBestDriver(rideRequest):
nearbyDrivers = geoSearch(rideRequest.pickup, radius=5km)
availableDrivers = filter(nearbyDrivers, status=AVAILABLE)
scoredDrivers = map(availableDrivers, driver => {
distance = calculateDistance(driver, pickup)
eta = calculateETA(driver, pickup)
rating = driver.rating
score = (rating * 0.4) - (eta * 0.6)
return { driver, score }
})
return sortBy(scoredDrivers, score).first
Surge Pricing:
function calculateSurgeMultiplier(city, timestamp):
demand = getRideRequestsLast10Min(city)
supply = getAvailableDrivers(city)
ratio = demand / supply
if ratio > 2.0: return 2.0
if ratio > 1.5: return 1.5
if ratio > 1.2: return 1.2
return 1.0
Bước 5: Bottlenecks & Tối ưu
Single Point of Failure
- Location Service: Multiple instances với consistent hashing.
- Database: PostgreSQL với replication, failover.
- Dispatch: Stateless, horizontal scaling.
Scalability Bottlenecks
- Location writes: 830k RPS → Shard theo driver_id hoặc city.
- Geo queries: Redis cluster với geo-sharding theo city.
- Matching: Parallelize driver search, timeout sau 2s.
Performance Optimization
- Caching:
- Driver locations in Redis (TTL 30s).
- Fare estimates cached cho common routes.
- Batching: Location updates batched mỗi 3-5s.
- Async processing:
- Payment processing async.
- Rating updates async.
- Connection pooling: Database connection pools.
Edge Cases
- Driver offline during ride: Re-match rider với driver khác.
- No drivers available: Queue ride, notify rider.
- GPS inaccuracy: Snap to road, use last known good location.
- Payment failure: Retry logic, fallback to cash.
Bước 6: Trade‑offs
Consistency vs Availability
- CP system cho pricing và payment: Strong consistency.
- AP system cho location tracking: Eventual consistency (delay vài giây acceptable).
Latency vs Accuracy
- Low latency matching: Match với driver gần nhất trong 2s.
- Better match: Wait 5-10s để tìm driver tốt hơn → trade-off.
Precision vs Cost
- High precision (GeoHash 8): ~20m accuracy, nhiều data.
- Lower precision (GeoHash 6): ~1km accuracy, ít data.
- Solution: Dynamic precision based on density.
Real-time vs Batch
- Real-time: Location streaming, instant matching.
- Batch: Analytics, surge pricing calculation, driver incentives.
Kết luận
Ride-sharing system là distributed system phức tạp với challenges:
- Real-time geospatial processing: Triệu location updates/giây.
- Matching algorithm: Balance giữa latency và quality.
- Dynamic pricing: Supply/demand balancing.
- High availability: Critical cho driver/rider experience.
- Payment integrity: Consistent, secure transactions.