Concepts Cơ bản & Kết nối .NET

Cài đặt

# Package chính thức cho Elasticsearch 8+
dotnet add package Elastic.Clients.Elasticsearch

# Nếu dùng Elasticsearch 7 (NEST - legacy)
dotnet add package NEST

Kết nối trong ASP.NET Core

// appsettings.json
{
  "Elasticsearch": {
    "Uri": "https://localhost:9200",
    "Username": "elastic",
    "Password": "changeme",
    "DefaultIndex": "products"
  }
}

// Program.cs
using Elastic.Clients.Elasticsearch;
using Elastic.Transport;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddSingleton<ElasticsearchClient>(sp =>
{
    var config = builder.Configuration.GetSection("Elasticsearch");
    var uri = new Uri(config["Uri"]!);

    var settings = new ElasticsearchClientSettings(uri)
        .Authentication(new BasicAuthentication(config["Username"]!, config["Password"]!))
        .DefaultIndex(config["DefaultIndex"]!)
        // Map C# type sang index name
        .DefaultMappingFor<Product>(m => m.IndexName("products"))
        .DefaultMappingFor<Order>(m => m.IndexName("orders"))
        .EnableDebugMode()           // Bật khi develop - log requests
        .DisableDirectStreaming();   // Bật khi debug - read response body

    return new ElasticsearchClient(settings);
});

// Kiểm tra kết nối
var app = builder.Build();
var es = app.Services.GetRequiredService<ElasticsearchClient>();
var ping = await es.PingAsync();
if (!ping.IsSuccess()) throw new Exception("Cannot connect to Elasticsearch");

Index, Document, Shard

Bảng so sánh với SQL:

Elasticsearch          SQL
─────────────         ──────────────
Index            ≈    Table
Document         ≈    Row
Field            ≈    Column
Mapping          ≈    Schema

Lưu ý: Không có khái niệm "Database" cấp cao hơn như SQL.
Elasticsearch → Index → Document

Document

Đơn vị dữ liệu cơ bản, lưu dạng JSON.

// Một document trong index "products"
{
  "_index": "products",
  "_id": "1",
  "_version": 1,
  "_source": {
    "name": "iPhone 15 Pro",
    "brand": "Apple",
    "price": 999.99,
    "category": "smartphones",
    "tags": ["5G", "flagship", "camera"],
    "specs": {
      "storage": "256GB",
      "ram": "8GB",
      "screen": "6.1 inch"
    },
    "in_stock": true,
    "created_at": "2024-01-15T10:30:00Z"
  }
}

Index

Tập hợp các documents có cấu trúc tương tự.

# Tạo index
PUT /products
{
  "settings": {
    "number_of_shards": 3,     # Số primary shards
    "number_of_replicas": 1    # Số replica per primary
  },
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "price": { "type": "float" }
    }
  }
}

# Xem thông tin index
GET /products

# Xóa index
DELETE /products

# Liệt kê tất cả indices
GET /_cat/indices?v

Shards & Replicas

┌──────────────────────────────────────────────────────────┐
│              Index "products" (1000 documents)           │
│                                                          │
│  Primary Shard 1    Primary Shard 2    Primary Shard 3  │
│  (documents 1-333)  (documents 334-666) (667-1000)      │
│         │                  │                  │         │
│         ▼                  ▼                  ▼         │
│  Replica Shard 1    Replica Shard 2    Replica Shard 3  │
│  (backup của P1)    (backup của P2)    (backup của P3)  │
└──────────────────────────────────────────────────────────┘

Primary Shard:

Số lượng cố định sau khi tạo index (không thể thay đổi)
Dữ liệu được phân tán đều qua các primary shards
Default: 1 shard (ES 7+), trước đây là 5

Replica Shard:

Bản sao của primary shard
Có thể thay đổi số replica bất kỳ lúc nào
Tăng read throughput, không tăng write throughput
Không đặt trên cùng node với primary của nó

# Thay đổi số replicas (có thể thay đổi sau khi tạo)
PUT /products/_settings
{
  "number_of_replicas": 2
}

Chọn số Shards phù hợp

Nguyên tắc:
- Mỗi shard ≈ 10-50GB data
- Số shards ≈ (tổng data / 30GB) hoặc số nodes
- Quá nhiều shards → overhead, slow
- Quá ít shards → không scale được

Ví dụ:
- 30GB data → 1 primary shard
- 300GB data → 10 primary shards
- 3TB data với 10 nodes → 100 primary shards

Cluster & Nodes

# Xem health của cluster
GET /_cluster/health

# Response
{
  "cluster_name": "my-cluster",
  "status": "green",         # green/yellow/red
  "number_of_nodes": 3,
  "number_of_data_nodes": 3,
  "active_primary_shards": 10,
  "active_shards": 20,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0
}

Cluster Status:

Green: Tất cả primary và replica shards đều active
Yellow: Tất cả primary active nhưng một số replica chưa được assign
Red: Một số primary shards không hoạt động

Node Types:

Master Node: Quản lý cluster (shards, indices)
Data Node: Lưu trữ data, thực hiện CRUD và search
Ingest Node: Pre-processing trước khi index (pipeline)
Coordinating Node: Route requests, merge results

Inverted Index

Cơ chế tìm kiếm full-text nhanh của Elasticsearch.

Documents:
Doc 1: "The quick brown fox"
Doc 2: "The lazy brown dog"
Doc 3: "The fox ate the dog"

Inverted Index:
┌──────────┬──────────────────┐
│  Term    │  Documents       │
├──────────┼──────────────────┤
│  the     │  [1, 2, 3]       │
│  quick   │  [1]             │
│  brown   │  [1, 2]          │
│  fox     │  [1, 3]          │
│  lazy    │  [2]             │
│  dog     │  [2, 3]          │
│  ate     │  [3]             │
└──────────┴──────────────────┘

Query: "fox"
→ Look up inverted index → Documents [1, 3]
→ Rất nhanh! O(1) lookup

Near Real-Time (NRT)

Document được index → Lưu vào buffer (in-memory)
                    → Sau mỗi 1 giây (refresh): buffer → segment
                    → Segment có thể search (NRT)
                    → Sau mỗi 30 phút (flush): segment → disk

Mặc định: Có thể search sau ~1 giây

# Force refresh ngay lập tức (tốn CPU)
POST /products/_refresh

# Thay đổi refresh interval
PUT /products/_settings
{
  "refresh_interval": "30s"   # Tăng để tăng tốc indexing
  # "refresh_interval": "-1"  # Tắt auto-refresh hoàn toàn
}

Document Versioning

# Mỗi document có version number
PUT /products/_doc/1
{ "name": "iPhone 15" }
# Response: "_version": 1

PUT /products/_doc/1
{ "name": "iPhone 15 Pro" }
# Response: "_version": 2

# Optimistic concurrency control
PUT /products/_doc/1?if_seq_no=2&if_primary_term=1
{ "name": "Updated name" }
# Fails nếu document đã được update bởi người khác

Keyboard shortcuts

Learning