Index Nâng Cao

1. Filtered Index (Index có điều kiện WHERE)

Filtered Index là non-clustered index được xây dựng chỉ trên một tập con hàng thỏa mãn điều kiện WHERE. Kết quả: index nhỏ hơn, hiệu quả hơn, ít maintenance hơn.

-- Ví dụ 1: Chỉ index các đơn hàng chưa hoàn thành
CREATE NONCLUSTERED INDEX IX_Orders_Pending
ON Orders (OrderDate, CustomerId)
INCLUDE (TotalAmount)
WHERE Status = 'Pending';   -- Filtered predicate

-- Ví dụ 2: Soft delete - chỉ index active records
CREATE NONCLUSTERED INDEX IX_Products_Active
ON Products (CategoryId, Price)
WHERE IsDeleted = 0;

-- Ví dụ 3: Sparse column - chỉ index hàng có giá trị (không NULL)
CREATE NONCLUSTERED INDEX IX_Employees_Email
ON Employees (Email)
WHERE Email IS NOT NULL;

Khi nào dùng Filtered Index

Use case	Ví dụ
Sparse data	Cột có nhiều NULL (chỉ index hàng có giá trị)
Soft deletes	`WHERE IsDeleted = 0`
Status filtering	`WHERE Status = 'Active'` hoặc `'Pending'`
Recent data	`WHERE CreatedDate >= '2024-01-01'`
Partial uniqueness	Unique trong subset (vd: unique email cho active users)

-- Filtered Unique Index: Email phải unique giữa các active users
CREATE UNIQUE NONCLUSTERED INDEX UIX_Users_Email_Active
ON Users (Email)
WHERE IsDeleted = 0;

-- Filtered index CHỈ được dùng khi query có predicate tương thích
-- Ví dụ: query này SẼ dùng IX_Orders_Pending
SELECT OrderId, OrderDate, CustomerId, TotalAmount
FROM Orders
WHERE Status = 'Pending'          -- ✅ Match filtered predicate
  AND OrderDate >= '2024-01-01';

-- Query này KHÔNG dùng IX_Orders_Pending (không có WHERE Status = 'Pending')
SELECT OrderId FROM Orders WHERE CustomerId = 123;

2. Columnstore Index

Columnstore Index lưu dữ liệu theo cột thay vì theo hàng - tối ưu cho analytical queries (OLAP) với aggregation trên nhiều hàng.

Clustered Columnstore Index (CCI)

Thay thế hoàn toàn table heap/clustered row-store.

-- Tạo bảng với Clustered Columnstore Index
CREATE TABLE FactSales (
    SaleId BIGINT,
    ProductId INT,
    CustomerId INT,
    SaleDate DATE,
    Quantity INT,
    UnitPrice DECIMAL(18,2),
    TotalAmount DECIMAL(18,2)
);

CREATE CLUSTERED COLUMNSTORE INDEX CCI_FactSales ON FactSales;

-- Hoặc khi tạo bảng (SQL Server 2022+)
-- WITH (CLUSTERED COLUMNSTORE INDEX)

Non-Clustered Columnstore Index (NCCI)

Thêm analytical capability vào bảng OLTP mà không thay đổi row-store.

-- Thêm NCCI vào bảng Orders OLTP để hỗ trợ reporting queries
CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Orders_Analytics
ON Orders (OrderDate, CustomerId, ProductId, TotalAmount, Status);

Cơ chế hoạt động

Columnstore Storage (theo cột):
ProductId column:  [1, 1, 2, 3, 3, 3, 5, 5, ...]  → Compress tốt (giá trị lặp lại)
SaleDate column:   [2024-01-01, 2024-01-01, ...]   → RLE compression
TotalAmount column:[99.99, 149.00, 49.99, ...]

Row-store (theo hàng):
Row 1: [1, 1, 2024-01-01, 99.99, ...]
Row 2: [1, 1, 2024-01-01, 149.00, ...]

Đặc điểm	Row-store	Columnstore
Lưu trữ	Theo hàng	Theo cột
Compression	Thấp (~1x)	Cao (5-10x điển hình)
Point queries	Tốt	Kém
Aggregation trên nhiều hàng	Chậm	Rất nhanh (batch mode)
Write performance	Tốt	Kém hơn (delta store)
Phù hợp	OLTP	OLAP, Data Warehouse

Delta Store & Tuple Mover

Khi INSERT vào Columnstore:
1. Rows mới → Delta Store (row-store B-tree, nhỏ)
2. Khi Delta Store đủ lớn (~1M rows)
3. Tuple Mover (background) compress và chuyển vào Column Segments
4. Deleted rows → Delete Bitmap (không xóa ngay)

-- Kiểm tra trạng thái Delta Store và Deleted rows
SELECT 
    i.name,
    rg.state_desc,
    rg.total_rows,
    rg.deleted_rows,
    rg.size_in_bytes
FROM sys.column_store_row_groups rg
INNER JOIN sys.indexes i ON rg.object_id = i.object_id AND rg.index_id = i.index_id
WHERE rg.object_id = OBJECT_ID('FactSales');

Batch Mode Execution

Columnstore kích hoạt Batch Mode: xử lý 64-900 hàng cùng lúc thay vì từng hàng. Kết hợp với SIMD CPU instructions → nhanh hơn 5-100x cho analytical queries.

-- Query analytical hưởng lợi từ CCI + Batch Mode
SELECT 
    ProductId,
    YEAR(SaleDate) AS SaleYear,
    SUM(TotalAmount) AS Revenue,
    COUNT(*) AS TxnCount
FROM FactSales
GROUP BY ProductId, YEAR(SaleDate)
ORDER BY Revenue DESC;

3. Full-Text Index

Full-Text Index cho phép tìm kiếm ngôn ngữ tự nhiên trong văn bản - không thể làm được với LIKE hoặc standard index.

-- Bước 1: Tạo Full-Text Catalog
CREATE FULLTEXT CATALOG ftCatalog AS DEFAULT;

-- Bước 2: Tạo Full-Text Index
CREATE FULLTEXT INDEX ON Products (ProductName, Description)
KEY INDEX PK_Products  -- Phải có unique index
ON ftCatalog
WITH STOPLIST = SYSTEM;  -- Dùng stopword list mặc định

-- CONTAINS: tìm kiếm chính xác
SELECT ProductId, ProductName
FROM Products
WHERE CONTAINS(ProductName, 'laptop');

-- CONTAINS với nhiều từ
WHERE CONTAINS(Description, '"high performance" AND "gaming"')

-- FREETEXT: tìm kiếm ngôn ngữ tự nhiên (stemming, synonyms)
WHERE FREETEXT(Description, 'fast computer processor')

-- CONTAINSTABLE: trả về ranking
SELECT p.ProductId, p.ProductName, ft.RANK
FROM Products p
INNER JOIN CONTAINSTABLE(Products, Description, 'gaming laptop') ft
    ON p.ProductId = ft.[KEY]
ORDER BY ft.RANK DESC;

LIKE vs Full-Text

	LIKE	Full-Text
Pattern matching	`'%keyword%'` (leading wildcard = scan)	Indexed word lookup
Hiệu suất	Chậm với `%...%`	Nhanh
Stemming (run/runs/running)	Không	Có
Proximity search	Không	Có (`NEAR`)
Ranking	Không	Có (`RANK`)
Setup	Không cần	Cần tạo catalog + index

4. XML Index

Cho bảng có cột kiểu XML - tăng tốc XQuery và XPath expressions.

-- Bảng với cột XML
CREATE TABLE Orders (
    OrderId INT PRIMARY KEY,
    OrderDetails XML
);

-- Primary XML Index: bắt buộc tạo trước
CREATE PRIMARY XML INDEX PXML_Orders_Details
ON Orders (OrderDetails);

-- Secondary XML Indexes (tùy loại query)
CREATE XML INDEX SXML_Orders_PATH   ON Orders (OrderDetails) USING XML INDEX PXML_Orders_Details FOR PATH;
CREATE XML INDEX SXML_Orders_VALUE  ON Orders (OrderDetails) USING XML INDEX PXML_Orders_Details FOR VALUE;
CREATE XML INDEX SXML_Orders_PROPERTY ON Orders (OrderDetails) USING XML INDEX PXML_Orders_Details FOR PROPERTY;

Secondary XML Index	Tối ưu cho
PATH	`/path` expressions
VALUE	`//tag = 'value'` lookups
PROPERTY	Đọc nhiều properties từ cùng node

5. Spatial Index

Cho cột kiểu GEOMETRY hoặc GEOGRAPHY.

-- Bảng với cột Geography
CREATE TABLE Locations (
    LocationId INT PRIMARY KEY,
    Name NVARCHAR(100),
    GeoPoint GEOGRAPHY
);

INSERT INTO Locations VALUES 
    (1, 'Hanoi', geography::STPointFromText('POINT(105.8412 21.0278)', 4326)),
    (2, 'Saigon', geography::STPointFromText('POINT(106.6297 10.8231)', 4326));

CREATE SPATIAL INDEX SIX_Locations_GeoPoint
ON Locations (GeoPoint)
USING GEOGRAPHY_GRID
WITH (GRIDS = (MEDIUM, MEDIUM, MEDIUM, MEDIUM), CELLS_PER_OBJECT = 16);

-- Query tìm locations trong vòng 10km từ một điểm
DECLARE @center GEOGRAPHY = geography::STPointFromText('POINT(105.8412 21.0278)', 4326);

SELECT Name, GeoPoint.STDistance(@center) AS DistanceMeters
FROM Locations
WHERE GeoPoint.STDistance(@center) <= 10000
ORDER BY DistanceMeters;

6. Unique Index

Đảm bảo không có giá trị trùng lặp trong cột/nhóm cột được index.

-- Unique index đơn giản
CREATE UNIQUE NONCLUSTERED INDEX UIX_Users_Email
ON Users (Email);

-- Unique index cho composite key
CREATE UNIQUE NONCLUSTERED INDEX UIX_Products_Category_Code
ON Products (CategoryId, ProductCode);

-- Xử lý NULL: Unique index cho phép nhiều NULL
-- (NULL != NULL trong SQL logic)
INSERT INTO Users (Email) VALUES (NULL);
INSERT INTO Users (Email) VALUES (NULL);  -- ✅ Thành công! NULL không vi phạm unique

-- Nếu muốn NULL cũng unique → dùng Filtered Index
CREATE UNIQUE NONCLUSTERED INDEX UIX_Users_Email_NotNull
ON Users (Email)
WHERE Email IS NOT NULL;

7. Index with INCLUDE Columns (Covering Index)

-- Composite key với INCLUDE: cân bằng between seek efficiency và covering
CREATE NONCLUSTERED INDEX IX_Orders_Customer_Date
ON Orders (CustomerId, OrderDate)   -- Key: dùng để seek, có thứ tự
INCLUDE (OrderId, Status, TotalAmount);  -- Chỉ ở leaf, không ảnh hưởng B-tree height

-- Tại sao không đưa tất cả vào key?
-- Key columns: xuất hiện ở mọi levels → làm B-tree rộng hơn → nhiều page hơn
-- INCLUDE: chỉ ở leaf level → không làm to B-tree → space hiệu quả hơn

Quyết định đưa cột vào Key hay INCLUDE

Điều kiện	Đưa vào Key	Đưa vào INCLUDE
Cần WHERE/JOIN/ORDER BY	✅	❌
Chỉ cần SELECT	❌	✅
Muốn có unique constraint	✅	❌ (không thể)

8. Composite Index: Thứ tự cột quan trọng

Column order trong composite index ảnh hưởng trực tiếp đến query nào có thể dùng index.

-- Index: (CustomerId, OrderDate, Status)
CREATE INDEX IX_Orders ON Orders (CustomerId, OrderDate, Status);

-- ✅ Có thể dùng index (leading columns match)
WHERE CustomerId = 100
WHERE CustomerId = 100 AND OrderDate = '2024-01-01'
WHERE CustomerId = 100 AND OrderDate BETWEEN '2024-01-01' AND '2024-12-31'
WHERE CustomerId = 100 AND OrderDate = '2024-01-01' AND Status = 'Active'

-- ⚠️ Partial use (chỉ dùng CustomerId cho seek, Status filter sau)
WHERE CustomerId = 100 AND Status = 'Active'  -- Bỏ qua OrderDate

-- ❌ KHÔNG thể dùng index seek (không có leading column)
WHERE OrderDate = '2024-01-01'
WHERE Status = 'Active'

Nguyên tắc chọn thứ tự cột

Equality first: Cột dùng = đặt trước
Range last: Cột dùng >, <, BETWEEN đặt sau
Selectivity: Cột có nhiều distinct values → seek hiệu quả hơn
ORDER BY / GROUP BY: Nếu order match với index → tránh Sort operator

-- Query: WHERE Status = 'Active' AND CreatedDate > '2024-01-01'
-- Status có 3 giá trị, CreatedDate rất nhiều giá trị

-- Option A: (Status, CreatedDate) → seek trên Status=Active, range scan CreatedDate
-- Option B: (CreatedDate, Status) → range scan CreatedDate, filter Status
-- → Option A thường tốt hơn vì seek trên equality trước

9. Index Intersection vs Index Covering

Index Intersection

SQL Server dùng nhiều indexes cho cùng một query và merge kết quả (AND/OR).

-- Có 2 indexes riêng
CREATE INDEX IX_Orders_CustomerId ON Orders (CustomerId);
CREATE INDEX IX_Orders_Status ON Orders (Status);

-- Query sau có thể trigger Index Intersection
WHERE CustomerId = 100 AND Status = 'Active';
-- SQL Server: seek IX_Orders_CustomerId → get OrderIds
--             seek IX_Orders_Status → get OrderIds
--             INTERSECT hai tập → key lookup cho rows trong cả hai

Index Covering vs Index Intersection

	Covering Index	Index Intersection
Cần	1 index với tất cả cột cần	Nhiều indexes
Hiệu suất	Thường tốt hơn	Có thêm cost merge
Flexibility	Index cụ thể cho query	Indexes có thể dùng nhiều query
Khi nào optimizer chọn	Khi covering index tồn tại	Khi không có covering index nhưng có nhiều partial indexes

10. In-Memory OLTP Indexes (Memory-Optimized Tables)

In-Memory OLTP tables (Hekaton) dùng các loại index khác hoàn toàn.

-- Tạo memory-optimized table
CREATE TABLE dbo.SessionCache (
    SessionId NVARCHAR(100) NOT NULL,
    UserId INT NOT NULL,
    Data NVARCHAR(MAX),
    CreatedAt DATETIME2 NOT NULL,
    CONSTRAINT PK_SessionCache PRIMARY KEY NONCLUSTERED HASH (SessionId)
        WITH (BUCKET_COUNT = 1000000)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY);

Hash Index (In-Memory)

-- Hash Index: O(1) point lookup, không hỗ trợ range
CONSTRAINT PK_Session PRIMARY KEY NONCLUSTERED HASH (SessionId)
    WITH (BUCKET_COUNT = 1000000);  -- Phải ước tính số distinct values

-- ✅ Tốt cho: = equality lookup
-- ❌ Không dùng được cho: range queries, ORDER BY

Range Index (Nonclustered, In-Memory)

-- Nonclustered range index: Bw-tree (lock-free B-tree)
-- Hỗ trợ range queries, ORDER BY
INDEX IX_Session_UserId NONCLUSTERED (UserId, CreatedAt)

	Hash Index	Range Index (Nonclustered)
Point lookup	O(1)	O(log N)
Range scan	❌ Không tốt	✅ Tốt
ORDER BY	❌ Không	✅ Có
Phù hợp	Cache, session lookups	Queries có range/order

11. Hypothetical Indexes (Index giả định)

SQL Server Database Engine Tuning Advisor (DTA) và một số tools dùng hypothetical indexes để test xem index có cải thiện query không mà không cần tạo thật.

-- Tạo hypothetical index (WITH STATISTICS_ONLY)
-- Tạo statistics nhưng KHÔNG tạo cấu trúc index thật
CREATE NONCLUSTERED INDEX IX_Orders_Test
ON Orders (CustomerId, OrderDate)
WITH STATISTICS_ONLY = -1;   -- Hypothetical

-- Bật chế độ cho phép optimizer dùng hypothetical indexes
DBCC AUTOPILOT(0, DB_ID(), OBJECT_ID('Orders'), ...);

-- Thực tế: Dùng Database Engine Tuning Advisor
-- 1. Capture workload (Profiler Trace hoặc Extended Events)
-- 2. Chạy DTA với workload
-- 3. DTA generates recommendations
-- 4. Review và apply các recommendations cần thiết

Thực tế: Hypothetical indexes hiếm khi dùng trực tiếp trong T-SQL. Chủ yếu là internal mechanism của DTA. Lập trình viên thường test bằng cách tạo index thật trên môi trường dev/staging.

Keyboard shortcuts

Learning