Case Study 6: Video Streaming (YouTube/Netflix)
Nền tảng streaming video với upload, encoding, storage, delivery, adaptive bitrate.
Bước 1: Thu thập yêu cầu
Functional requirements
- Video upload: Users upload videos (multiple formats, sizes).
- Video processing: Transcoding to multiple resolutions/bitrates.
- Video streaming: Adaptive bitrate streaming (HLS/DASH).
- Search & Discovery: Search, recommendations, trending.
- User features: Playlists, watch history, subscriptions.
- Comments & Likes: User interactions.
- Live streaming: Real-time video streaming (optional).
Non‑functional requirements
- Low latency: Video start < 2s, seek < 1s.
- High availability: 99.9% uptime.
- Scalability: Hàng tỷ videos, triệu concurrent viewers.
- Bandwidth: Efficient delivery, global scale.
- Quality: Adaptive bitrate based on network.
Scale estimation
- Users: 2 tỷ users.
- Videos: 10 tỷ videos.
- Uploads per day: 500,000 hours of video.
- Concurrent viewers: 10 triệu.
- Bandwidth: 100 Tbps peak.
Bước 2: Ước lượng
Traffic estimates
- Upload requests: 500k hours/day * 100 MB/min ≈ 350 PB/month.
- Streaming requests: 2B users * 1 hour/day ≈ 2B hours/day.
- Concurrent streams: 10 triệu.
- Bandwidth: 10M concurrent * 5 Mbps ≈ 50 Tbps.
Storage estimates
- Raw video: 500k hours/day * 60 min * 100 MB/min ≈ 3 PB/day.
- Encoded video (5 renditions): 3 PB * 5 ≈ 15 PB/day.
- Monthly storage: 15 PB * 30 ≈ 450 PB/month.
- Total storage: Exabytes scale.
Bandwidth estimates
- Upload: 350 PB/month ≈ 135 GB/s.
- Download: 2B hours/day * 5 Mbps ≈ 115 Tbps (peak).
- CDN cache hit: 90% → Origin bandwidth ≈ 11.5 Tbps.
Bước 3: Thiết kế High‑Level
Components chính
┌──────────┐ ┌─────────────┐ ┌──────────────┐
│ Client │ ──→ │ CDN │ ──→ │ Edge Server │
│ (Web/Mob)│ │ (CloudFront)│ │ │
└──────────┘ └─────────────┘ └──────────────┘
│
┌───────┴───────┐
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Streaming │ │ API │
│ Service │ │ Service │
└───────────────┘ └───────────────┘
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Object Store │ │ PostgreSQL │
│ (S3) │ │ (Metadata) │
└───────────────┘ └───────────────┘
▲
│
┌───────────────┐
│ Transcoding │
│ Pipeline │
└───────────────┘
▲
│
┌───────────────┐
│ Upload │
│ Service │
└───────────────┘
Technology selection
- Upload: Direct-to-S3 với presigned URLs.
- Transcoding: AWS Elemental, FFmpeg cluster.
- Storage: S3 cho video files, Glacier cho archive.
- CDN: CloudFront, Akamai cho edge caching.
- Streaming: HLS (Apple), DASH (MPEG) cho adaptive bitrate.
- Database: PostgreSQL cho metadata, Cassandra cho analytics.
- Search: Elasticsearch cho video search.
- Cache: Redis cho metadata, trending.
Bước 4: Thiết kế Chi tiết
Database Schema
Table: videos
| Column | Type | Description |
|---|---|---|
| video_id | UUID | Primary key |
| uploader_id | BIGINT | Foreign key |
| title | VARCHAR(500) | Video title |
| description | TEXT | Video description |
| duration | INT | Duration in seconds |
| status | TINYINT | Processing/Ready/Private |
| view_count | BIGINT | Denormalized count |
| created_at | TIMESTAMP | Upload time |
Table: video_renditions
| Column | Type | Description |
|---|---|---|
| rendition_id | UUID | Primary key |
| video_id | UUID | Foreign key |
| resolution | VARCHAR(10) | 1080p, 720p, 480p, etc. |
| bitrate | INT | Bitrate in kbps |
| codec | VARCHAR(20) | H.264, H.265, VP9 |
| s3_key | VARCHAR(500) | S3 object key |
| file_size | BIGINT | File size in bytes |
Table: video_segments
| Column | Type | Description |
|---|---|---|
| rendition_id | UUID | Foreign key |
| segment_num | INT | Segment number (0, 1, 2…) |
| s3_key | VARCHAR(500) | Segment file key |
| duration | DECIMAL(5,3) | Segment duration (~10s) |
Video Processing Pipeline
Upload → Transcode → Store → Distribute
1. Upload:
- Client request presigned URL từ Upload Service.
- Upload directly to S3 (multipart upload cho large files).
- S3 event trigger → Transcoding Pipeline.
2. Transcoding:
- Download raw video từ S3.
- Generate multiple renditions:
- 4K (2160p): 20 Mbps
- 1080p: 5 Mbps
- 720p: 2.5 Mbps
- 480p: 1 Mbps
- 360p: 0.5 Mbps
- Split into segments (~10s each).
- Generate manifest files (.m3u8 cho HLS, .mpd cho DASH).
- Upload renditions + manifests to S3.
3. Storage:
- Hot storage (S3 Standard): Popular videos.
- Cold storage (S3 Glacier): Old/rarely accessed videos.
- Lifecycle policies auto-transition sau 90 days.
4. Distribution:
- CDN pull from S3 origin.
- Edge servers cache segments.
- TTL based on video popularity.
Adaptive Bitrate Streaming
HLS (HTTP Live Streaming):
# Master playlist (index.m3u8)
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
480p/index.m3u8
# Media playlist (1080p/index.m3u8)
#EXTM3U
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:10.0,
segment0.ts
#EXTINF:10.0,
segment1.ts
...
Client adaptation:
- Client monitor bandwidth và buffer.
- Switch rendition khi network thay đổi.
- Seamless quality adjustment.
API Design
POST /api/v1/videos/upload
{
"title": "My Video",
"description": "...",
"file_size": 1073741824,
"duration": 600
}
Response:
{
"video_id": "uuid",
"upload_url": "https://s3.amazonaws.com/...?signature=xxx",
"status": "processing"
}
GET /api/v1/videos/{id}/stream
Response:
{
"manifest_url": "https://cdn.example.com/videos/{id}/master.m3u8",
"duration": 600,
"available_resolutions": ["1080p", "720p", "480p"]
}
GET /api/v1/videos/{id}
POST /api/v1/videos/{id}/like
POST /api/v1/videos/{id}/comment
Data Flow
Upload Video:
- Client POST /videos/upload.
- Create video metadata (status=PROCESSING).
- Return presigned S3 URL.
- Client upload directly to S3.
- S3 event → SQS message.
- Transcoding Service consume message.
- Download, transcode, upload renditions.
- Update video status (READY).
- Invalidate CDN cache.
- Notify user.
Stream Video:
- Client GET /videos/{id}/stream.
- API return manifest URL (CDN).
- Client request manifest from CDN.
- CDN serve from edge cache (or pull from origin).
- Client download segments adaptively.
- Track view progress, quality changes.
- Periodic heartbeat: update view_count, watch history.
Bước 5: Bottlenecks & Tối ưu
Single Point of Failure
- Transcoding: Multiple workers, auto-scaling.
- S3: Built-in redundancy (11 9s durability).
- CDN: Multiple CDN providers (CloudFront + Akamai).
Scalability Bottlenecks
- Transcoding: Horizontal scaling với queue-based processing.
- CDN origin: S3 với CloudFront, 90%+ cache hit.
- Metadata reads: Read replicas, Redis cache.
Performance Optimization
- Edge caching: CDN cache segments tại edge.
- Prefetching: Client prefetch next segments.
- Parallel downloads: Download multiple segments concurrently.
- Compression: gzip manifest files.
Cost Optimization
- Storage tiering: Hot → Cold → Archive.
- CDN optimization: Cache policies, compression.
- Transcoding: Spot instances cho non-urgent jobs.
- Regional encoding: Transcode close to upload location.
Bước 6: Trade‑offs
Consistency vs Availability
- AP system: View count eventual consistency (delay acceptable).
- CP system: Video availability after upload (must be consistent).
Latency vs Quality
- Low latency: Start with low resolution, scale up.
- High quality: Buffer more before start.
- Solution: Adaptive bitrate balances both.
Storage vs Bandwidth
- More renditions: Better UX, higher storage cost.
- Fewer renditions: Less storage, worse UX for some users.
- Solution: 5-7 renditions optimal.
HLS vs DASH
| Format | Pros | Cons |
|---|---|---|
| HLS | Widely supported, Apple ecosystem | Apple-controlled, H.264 only |
| DASH | Open standard, codec-agnostic | Less supported on iOS |
Kết luận
Video streaming platform là hệ thống cực kỳ phức tạp với challenges:
- Massive storage: Exabytes scale.
- Bandwidth optimization: CDN, adaptive bitrate.
- Processing pipeline: Transcoding hàng triệu videos/ngày.
- Global delivery: Low latency worldwide.
- Cost management: Storage, bandwidth, transcoding costs.