Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Case Study 6: Video Streaming (YouTube/Netflix)

Nền tảng streaming video với upload, encoding, storage, delivery, adaptive bitrate.


Bước 1: Thu thập yêu cầu

Functional requirements

  • Video upload: Users upload videos (multiple formats, sizes).
  • Video processing: Transcoding to multiple resolutions/bitrates.
  • Video streaming: Adaptive bitrate streaming (HLS/DASH).
  • Search & Discovery: Search, recommendations, trending.
  • User features: Playlists, watch history, subscriptions.
  • Comments & Likes: User interactions.
  • Live streaming: Real-time video streaming (optional).

Non‑functional requirements

  • Low latency: Video start < 2s, seek < 1s.
  • High availability: 99.9% uptime.
  • Scalability: Hàng tỷ videos, triệu concurrent viewers.
  • Bandwidth: Efficient delivery, global scale.
  • Quality: Adaptive bitrate based on network.

Scale estimation

  • Users: 2 tỷ users.
  • Videos: 10 tỷ videos.
  • Uploads per day: 500,000 hours of video.
  • Concurrent viewers: 10 triệu.
  • Bandwidth: 100 Tbps peak.

Bước 2: Ước lượng

Traffic estimates

  • Upload requests: 500k hours/day * 100 MB/min ≈ 350 PB/month.
  • Streaming requests: 2B users * 1 hour/day ≈ 2B hours/day.
  • Concurrent streams: 10 triệu.
  • Bandwidth: 10M concurrent * 5 Mbps ≈ 50 Tbps.

Storage estimates

  • Raw video: 500k hours/day * 60 min * 100 MB/min ≈ 3 PB/day.
  • Encoded video (5 renditions): 3 PB * 5 ≈ 15 PB/day.
  • Monthly storage: 15 PB * 30 ≈ 450 PB/month.
  • Total storage: Exabytes scale.

Bandwidth estimates

  • Upload: 350 PB/month ≈ 135 GB/s.
  • Download: 2B hours/day * 5 Mbps ≈ 115 Tbps (peak).
  • CDN cache hit: 90% → Origin bandwidth ≈ 11.5 Tbps.

Bước 3: Thiết kế High‑Level

Components chính

┌──────────┐     ┌─────────────┐     ┌──────────────┐
│  Client  │ ──→ │ CDN         │ ──→ │ Edge Server  │
│ (Web/Mob)│     │ (CloudFront)│     │              │
└──────────┘     └─────────────┘     └──────────────┘
                                            │
                                    ┌───────┴───────┐
                                    ▼               ▼
                            ┌───────────────┐ ┌───────────────┐
                            │   Streaming   │ │   API         │
                            │   Service     │ │   Service     │
                            └───────────────┘ └───────────────┘
                                    │               │
                                    ▼               ▼
                            ┌───────────────┐ ┌───────────────┐
                            │ Object Store  │ │   PostgreSQL  │
                            │ (S3)          │ │   (Metadata)  │
                            └───────────────┘ └───────────────┘
                                    ▲
                                    │
                            ┌───────────────┐
                            │  Transcoding  │
                            │  Pipeline     │
                            └───────────────┘
                                    ▲
                                    │
                            ┌───────────────┐
                            │   Upload      │
                            │   Service     │
                            └───────────────┘

Technology selection

  • Upload: Direct-to-S3 với presigned URLs.
  • Transcoding: AWS Elemental, FFmpeg cluster.
  • Storage: S3 cho video files, Glacier cho archive.
  • CDN: CloudFront, Akamai cho edge caching.
  • Streaming: HLS (Apple), DASH (MPEG) cho adaptive bitrate.
  • Database: PostgreSQL cho metadata, Cassandra cho analytics.
  • Search: Elasticsearch cho video search.
  • Cache: Redis cho metadata, trending.

Bước 4: Thiết kế Chi tiết

Database Schema

Table: videos

ColumnTypeDescription
video_idUUIDPrimary key
uploader_idBIGINTForeign key
titleVARCHAR(500)Video title
descriptionTEXTVideo description
durationINTDuration in seconds
statusTINYINTProcessing/Ready/Private
view_countBIGINTDenormalized count
created_atTIMESTAMPUpload time

Table: video_renditions

ColumnTypeDescription
rendition_idUUIDPrimary key
video_idUUIDForeign key
resolutionVARCHAR(10)1080p, 720p, 480p, etc.
bitrateINTBitrate in kbps
codecVARCHAR(20)H.264, H.265, VP9
s3_keyVARCHAR(500)S3 object key
file_sizeBIGINTFile size in bytes

Table: video_segments

ColumnTypeDescription
rendition_idUUIDForeign key
segment_numINTSegment number (0, 1, 2…)
s3_keyVARCHAR(500)Segment file key
durationDECIMAL(5,3)Segment duration (~10s)

Video Processing Pipeline

Upload → Transcode → Store → Distribute

1. Upload:

  • Client request presigned URL từ Upload Service.
  • Upload directly to S3 (multipart upload cho large files).
  • S3 event trigger → Transcoding Pipeline.

2. Transcoding:

  • Download raw video từ S3.
  • Generate multiple renditions:
    • 4K (2160p): 20 Mbps
    • 1080p: 5 Mbps
    • 720p: 2.5 Mbps
    • 480p: 1 Mbps
    • 360p: 0.5 Mbps
  • Split into segments (~10s each).
  • Generate manifest files (.m3u8 cho HLS, .mpd cho DASH).
  • Upload renditions + manifests to S3.

3. Storage:

  • Hot storage (S3 Standard): Popular videos.
  • Cold storage (S3 Glacier): Old/rarely accessed videos.
  • Lifecycle policies auto-transition sau 90 days.

4. Distribution:

  • CDN pull from S3 origin.
  • Edge servers cache segments.
  • TTL based on video popularity.

Adaptive Bitrate Streaming

HLS (HTTP Live Streaming):

# Master playlist (index.m3u8)
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
480p/index.m3u8

# Media playlist (1080p/index.m3u8)
#EXTM3U
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:10.0,
segment0.ts
#EXTINF:10.0,
segment1.ts
...

Client adaptation:

  • Client monitor bandwidth và buffer.
  • Switch rendition khi network thay đổi.
  • Seamless quality adjustment.

API Design

POST /api/v1/videos/upload
{
  "title": "My Video",
  "description": "...",
  "file_size": 1073741824,
  "duration": 600
}

Response:
{
  "video_id": "uuid",
  "upload_url": "https://s3.amazonaws.com/...?signature=xxx",
  "status": "processing"
}

GET /api/v1/videos/{id}/stream
Response:
{
  "manifest_url": "https://cdn.example.com/videos/{id}/master.m3u8",
  "duration": 600,
  "available_resolutions": ["1080p", "720p", "480p"]
}

GET /api/v1/videos/{id}
POST /api/v1/videos/{id}/like
POST /api/v1/videos/{id}/comment

Data Flow

Upload Video:

  1. Client POST /videos/upload.
  2. Create video metadata (status=PROCESSING).
  3. Return presigned S3 URL.
  4. Client upload directly to S3.
  5. S3 event → SQS message.
  6. Transcoding Service consume message.
  7. Download, transcode, upload renditions.
  8. Update video status (READY).
  9. Invalidate CDN cache.
  10. Notify user.

Stream Video:

  1. Client GET /videos/{id}/stream.
  2. API return manifest URL (CDN).
  3. Client request manifest from CDN.
  4. CDN serve from edge cache (or pull from origin).
  5. Client download segments adaptively.
  6. Track view progress, quality changes.
  7. Periodic heartbeat: update view_count, watch history.

Bước 5: Bottlenecks & Tối ưu

Single Point of Failure

  • Transcoding: Multiple workers, auto-scaling.
  • S3: Built-in redundancy (11 9s durability).
  • CDN: Multiple CDN providers (CloudFront + Akamai).

Scalability Bottlenecks

  • Transcoding: Horizontal scaling với queue-based processing.
  • CDN origin: S3 với CloudFront, 90%+ cache hit.
  • Metadata reads: Read replicas, Redis cache.

Performance Optimization

  • Edge caching: CDN cache segments tại edge.
  • Prefetching: Client prefetch next segments.
  • Parallel downloads: Download multiple segments concurrently.
  • Compression: gzip manifest files.

Cost Optimization

  • Storage tiering: Hot → Cold → Archive.
  • CDN optimization: Cache policies, compression.
  • Transcoding: Spot instances cho non-urgent jobs.
  • Regional encoding: Transcode close to upload location.

Bước 6: Trade‑offs

Consistency vs Availability

  • AP system: View count eventual consistency (delay acceptable).
  • CP system: Video availability after upload (must be consistent).

Latency vs Quality

  • Low latency: Start with low resolution, scale up.
  • High quality: Buffer more before start.
  • Solution: Adaptive bitrate balances both.

Storage vs Bandwidth

  • More renditions: Better UX, higher storage cost.
  • Fewer renditions: Less storage, worse UX for some users.
  • Solution: 5-7 renditions optimal.

HLS vs DASH

FormatProsCons
HLSWidely supported, Apple ecosystemApple-controlled, H.264 only
DASHOpen standard, codec-agnosticLess supported on iOS

Kết luận

Video streaming platform là hệ thống cực kỳ phức tạp với challenges:

  • Massive storage: Exabytes scale.
  • Bandwidth optimization: CDN, adaptive bitrate.
  • Processing pipeline: Transcoding hàng triệu videos/ngày.
  • Global delivery: Low latency worldwide.
  • Cost management: Storage, bandwidth, transcoding costs.

← Ride‑sharing | Xem tiếp: Interview Questions →