Case Study 2: Chat Application (WhatsApp/Telegram)
Ứng dụng chat real-time với support group chat, multimedia, presence.
Bước 1: Thu thập yêu cầu
Functional requirements
- 1-on-1 messaging: Gửi/nhận tin nhắn giữa 2 users.
- Group chat: Support đến 1000 participants.
- Multimedia: Images, videos, voice messages.
- Presence: Online/offline status, last seen.
- Read receipts: Delivered, read status.
- Push notifications: Cho offline users.
Non‑functional requirements
- Low latency: Message delivery < 500ms.
- High availability: 99.9% uptime.
- Scalability: Hàng tỷ users, triệu concurrent connections.
- Reliability: Không mất tin nhắn.
- Ordering: Messages hiển thị đúng thứ tự.
Scale estimation
- Users: 1 tỷ DAU (Daily Active Users).
- Concurrent connections: 100 triệu.
- Messages per day: 100 tỷ.
- Media uploads: 10 tỷ mỗi ngày.
Bước 2: Ước lượng
Traffic estimates
- Messages: 100B / 86400 ≈ 1.15 triệu RPS (average).
- Peak RPS: ~5x average → 6 triệu RPS.
- Concurrent connections: 100 triệu WebSocket connections.
Storage estimates
- Text message: 1 KB per message.
- Daily text storage: 100B * 1 KB ≈ 100 TB.
- Media storage: 10B * 500 KB ≈ 5 PB mỗi ngày.
- 5 years: ~10 EB (chưa tính replication).
Bandwidth estimates
- Upload: 1.15M RPS * 1 KB ≈ 1.15 GB/s (text only).
- Download: Fan-out 10x → 11.5 GB/s.
- Media: Significant higher, cần CDN.
Bước 3: Thiết kế High‑Level
Components chính
┌──────────┐ ┌─────────────┐ ┌──────────────┐
│ Client │ ──→ │ API Gateway │ ──→ │ Chat Service │
│ (WebSocket)│ │ (SSL term) │ │ (Stateless) │
└──────────┘ └─────────────┘ └──────────────┘
│
┌─────────────────────────────┼─────────────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Message Queue│ │ Database │ │ Cache │
│ (Kafka) │ │ (Cassandra) │ │ (Redis) │
└──────────────┘ └──────────────┘ └──────────────┘
│
▼
┌──────────────┐
│ Delivery │
│ Service │
└──────────────┘
Technology selection
- Connection protocol: WebSocket (primary), Long polling (fallback).
- Load Balancer: NGINX với WebSocket support.
- App Servers: Stateless, horizontal scaling.
- Message Queue: Kafka cho durability và replayability.
- Database: Cassandra cho write-heavy, time-series data.
- Cache: Redis cho presence, session, recent messages.
- Media storage: S3 + CDN (CloudFront).
Bước 4: Thiết kế Chi tiết
Database Schema
Table: messages
| Column | Type | Description |
|---|---|---|
| message_id | UUID | Primary key |
| chat_id | BIGINT | Chat room ID (indexed) |
| sender_id | BIGINT | User ID |
| content | TEXT | Message content |
| media_url | TEXT | Optional media link |
| created_at | TIMESTAMP | Time (clustering key) |
| status | TINYINT | Sent/Delivered/Read |
Table: chat_members
| Column | Type | Description |
|---|---|---|
| chat_id | BIGINT | Chat ID |
| user_id | BIGINT | User ID |
| role | TINYINT | Admin/Member |
| joined_at | TIMESTAMP | Join time |
Table: user_presence
| Column | Type | Description |
|---|---|---|
| user_id | BIGINT | Primary key (Redis) |
| status | TINYINT | Online/Offline |
| last_seen | TIMESTAMP | Last activity |
API Design
WebSocket Connection:
wss://chat.example.com/ws?token=xxx
Message Send:
{
"type": "message",
"chat_id": 12345,
"content": "Hello!",
"media_url": null
}
Message Receive:
{
"type": "message",
"message_id": "uuid",
"chat_id": 12345,
"sender_id": 67890,
"content": "Hello!",
"created_at": "2024-01-01T12:00:00Z"
}
Presence Update:
{
"type": "presence",
"user_id": 67890,
"status": "online"
}
Data Flow
Send Message (1-on-1):
- Client gửi message qua WebSocket.
- Chat Service nhận, validate.
- Lưu message vào database (partition by chat_id).
- Publish event đến Kafka topic
chat-{chat_id}. - Delivery Service subscribe, forward đến recipient.
- Update cache (recent messages).
- Send push notification nếu recipient offline.
Receive Message:
- Delivery Service nhận từ Kafka.
- Lookup recipient’s connection (which server).
- Forward qua WebSocket connection.
- Client ack → update status thành “delivered”.
- Khi user đọc → update “read” status.
Presence System:
- Khi client connect → set Redis key
presence:{user_id}= online. - Heartbeat mỗi 30s để giữ connection.
- Khi disconnect/timer expire → set offline + last_seen.
- Subscribe presence của contacts để nhận updates.
Bước 5: Bottlenecks & Tối ưu
Single Point of Failure
- WebSocket connections: Multiple servers với sticky sessions.
- Database: Cassandra replication factor 3.
- Kafka: Multiple brokers, replicated topics.
- Redis: Sentinel hoặc cluster mode.
Scalability Bottlenecks
- Connection scaling: Mỗi server handle ~100k connections → cần 1000 servers cho 100M concurrent.
- Database write: Cassandra auto-sharding theo partition key.
- Message fan-out: Group chat với 1000 members → batch delivery.
Performance Optimization
- Message ordering: Dùng timestamp + sequence number.
- Recent messages cache: Redis sorted set cho last 50 messages.
- Media optimization: Compress images, adaptive bitrate cho videos.
- Batch updates: Read receipts batched mỗi 5s.
Bước 6: Trade‑offs
Consistency vs Availability
- AP system: Eventual consistency cho messages và presence.
- Message có thể delay vài giây nhưng không mất.
- Presence có thể stale trong 30-60s.
Latency vs Throughput
- Low latency: WebSocket persistent connection.
- High throughput: Batch message delivery cho group chat.
Cost vs Performance
- Managed services (AWS): Đắt hơn nhưng auto-scaling.
- Self-hosted: Rẻ hơn nhưng cần large ops team 24/7.
WebSocket vs Long Polling
| Approach | Pros | Cons |
|---|---|---|
| WebSocket | Real-time, low latency, bidirectional | Complex, battery drain |
| Long Polling | Simple, works everywhere | Higher latency, more connections |
Kết luận
Chat application là hệ thống phức tạp với nhiều challenges:
- Connection management cho triệu concurrent users.
- Message ordering & delivery guarantee.
- Presence system với low latency.
- Media handling với storage và bandwidth lớn.
- Push notifications cho offline users.