Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Monitoring & Diagnostics

Dynamic Management Views (DMVs)

DMV là các views và functions hệ thống cung cấp thông tin runtime về SQL Server — không cần third-party tools.

Phân loại DMVs

  • sys.dm_exec_*: query execution, sessions, requests
  • sys.dm_os_*: operating system, memory, waits, schedulers
  • sys.dm_tran_*: transactions và locks
  • sys.dm_io_*: I/O
  • sys.dm_db_*: database-level metrics
  • sys.dm_hadr_*: Always On AG health

1. sys.dm_exec_requests — Queries đang chạy

-- Xem tất cả queries đang thực thi
SELECT 
    r.session_id,
    r.status,
    r.start_time,
    r.command,
    DATEDIFF(SECOND, r.start_time, GETDATE()) AS duration_seconds,
    r.cpu_time,
    r.reads,
    r.writes,
    r.logical_reads,
    r.wait_type,
    r.wait_time,
    r.blocking_session_id,
    DB_NAME(r.database_id) AS database_name,
    SUBSTRING(t.text, 
        (r.statement_start_offset / 2) + 1,
        CASE r.statement_end_offset 
            WHEN -1 THEN DATALENGTH(t.text)
            ELSE r.statement_end_offset 
        END - r.statement_start_offset) / 2 + 1) AS current_statement,
    t.text AS full_query,
    qp.query_plan
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
CROSS APPLY sys.dm_exec_query_plan(r.plan_handle) qp
WHERE r.session_id > 50  -- Exclude system sessions
    AND r.status != 'background'
ORDER BY r.cpu_time DESC;

-- Tìm query đang bị block
SELECT 
    r.session_id,
    r.blocking_session_id,
    DATEDIFF(SECOND, r.start_time, GETDATE()) AS blocked_for_seconds,
    t.text AS blocked_query,
    r.wait_type
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.blocking_session_id > 0;

2. sys.dm_exec_sessions — Connected Sessions

-- Xem tất cả sessions đang kết nối
SELECT 
    s.session_id,
    s.login_name,
    s.host_name,
    s.program_name,
    s.status,
    s.cpu_time,
    s.memory_usage * 8 AS memory_kb,
    s.total_elapsed_time / 1000 AS elapsed_seconds,
    s.last_request_start_time,
    s.reads,
    s.writes,
    s.logical_reads,
    DB_NAME(s.database_id) AS current_database,
    s.open_transaction_count
FROM sys.dm_exec_sessions s
WHERE s.is_user_process = 1  -- Chỉ user sessions, không phải system
ORDER BY s.cpu_time DESC;

-- Xem sessions với open transactions lâu (potential blocking)
SELECT 
    s.session_id,
    s.login_name,
    s.open_transaction_count,
    DATEDIFF(MINUTE, s.last_request_start_time, GETDATE()) AS idle_minutes,
    t.text AS last_query
FROM sys.dm_exec_sessions s
CROSS APPLY sys.dm_exec_sql_text(s.most_recent_sql_handle) t
WHERE s.open_transaction_count > 0
    AND s.is_user_process = 1
ORDER BY idle_minutes DESC;

3. sys.dm_exec_query_stats — Query Statistics từ Cache

-- Top 10 queries tiêu tốn CPU nhiều nhất
SELECT TOP 10
    qs.execution_count,
    qs.total_worker_time / 1000 AS total_cpu_ms,
    qs.total_worker_time / qs.execution_count / 1000 AS avg_cpu_ms,
    qs.total_logical_reads,
    qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
    qs.total_elapsed_time / 1000 AS total_elapsed_ms,
    qs.total_elapsed_time / qs.execution_count / 1000 AS avg_elapsed_ms,
    qs.creation_time AS plan_created_time,
    SUBSTRING(t.text, 
        (qs.statement_start_offset / 2) + 1,
        (ISNULL(qs.statement_end_offset, DATALENGTH(t.text)) - 
         qs.statement_start_offset) / 2 + 1) AS query_text,
    qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) t
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY qs.total_worker_time DESC;

-- Top 10 queries tốn I/O (logical reads)
SELECT TOP 10
    qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
    qs.execution_count,
    qs.total_logical_reads,
    SUBSTRING(t.text, 1, 200) AS query_snippet
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) t
ORDER BY avg_logical_reads DESC;

4. sys.dm_os_wait_stats — Wait Statistics

Wait statistics là cách tốt nhất để chẩn đoán bottleneck.

-- Top wait types (loại trừ idle waits)
SELECT TOP 20
    wait_type,
    wait_time_ms / 1000.0 AS wait_time_seconds,
    waiting_tasks_count,
    wait_time_ms / NULLIF(waiting_tasks_count, 0) AS avg_wait_ms,
    CAST(wait_time_ms * 100.0 / SUM(wait_time_ms) OVER() AS DECIMAL(5,2)) AS pct_total
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (
    -- Idle/benign waits - loại trừ
    'SLEEP_TASK', 'SLEEP_SYSTEMTASK', 'BROKER_TO_FLUSH',
    'BROKER_TASK_STOP', 'CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT',
    'DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
    'HADR_FILESTREAM_IOMGR_IOCOMPLETION', 'HADR_WORK_QUEUE',
    'LAZYWRITER_SLEEP', 'LOGMGR_QUEUE', 'ONDEMAND_TASK_QUEUE',
    'REQUEST_FOR_DEADLOCK_SEARCH', 'RESOURCE_QUEUE', 
    'SERVER_IDLE_CHECK', 'SLEEP_DBSTARTUP', 'SLEEP_DCOMSTARTUP',
    'SLEEP_MASTERDBREADY', 'SLEEP_MASTERMDREADY', 'SLEEP_MASTERUPGRADED',
    'SLEEP_MSDBSTARTUP', 'SLEEP_TEMPDBSTARTUP', 'SNI_HTTP_ACCEPT',
    'SP_SERVER_DIAGNOSTICS_SLEEP', 'SQLTRACE_BUFFER_FLUSH',
    'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', 'WAITFOR',
    'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', 'XE_DISPATCHER_WAIT',
    'XE_TIMER_EVENT', 'BROKER_EVENTHANDLER', 'CHECKPOINT_QUEUE',
    'DBMIRROR_EVENTS_QUEUE', 'SQLTRACE_WAIT_ENTRIES',
    'WAIT_XTP_CKPT_CLOSE', 'XE_DISPATCHER_JOIN'
)
ORDER BY wait_time_ms DESC;

-- Reset wait stats (để measure từ đầu)
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);

Phân tích các Wait Types phổ biến

Wait TypeÝ nghĩaGiải pháp
PAGEIOLATCH_SH/EXĐọc/ghi trang từ diskThiếu RAM, I/O bottleneck, thiếu index
LCK_M_X, LCK_M_SChờ lock (blocking)Blocking chains, long transactions, thiếu index
CXPACKETParallel query waitsĐiều chỉnh MAXDOP, Cost Threshold for Parallelism
SOS_SCHEDULER_YIELDCPU pressureCPU overload, nhiều CPU-intensive queries
WRITELOGChờ log flushSlow disk (log file), high write workload
ASYNC_NETWORK_IOClient đọc dữ liệu chậmứng dụng xử lý kết quả chậm, large result sets
RESOURCE_SEMAPHOREChờ memory grantMemory pressure, sort/hash operations lớn
THREADPOOLKhông đủ workersCPU pressure, too many connections
PAGELATCH_*In-memory page contentionHotspot pages (GAM, PFS, identity columns)

5. sys.dm_os_waiting_tasks — Tasks đang chờ hiện tại

-- Xem blocking chains chi tiết
WITH BlockingChain AS (
    SELECT 
        wt.waiting_task_address,
        wt.session_id,
        wt.wait_type,
        wt.wait_duration_ms,
        wt.blocking_session_id,
        wt.resource_description,
        es.login_name,
        SUBSTRING(t.text, 1, 100) AS query_snippet
    FROM sys.dm_os_waiting_tasks wt
    JOIN sys.dm_exec_sessions es ON wt.session_id = es.session_id
    CROSS APPLY sys.dm_exec_sql_text(es.most_recent_sql_handle) t
    WHERE wt.blocking_session_id IS NOT NULL
)
SELECT 
    session_id AS blocked_session,
    blocking_session_id AS blocker,
    wait_type,
    wait_duration_ms / 1000.0 AS wait_seconds,
    login_name,
    query_snippet
FROM BlockingChain
ORDER BY wait_duration_ms DESC;

6. sys.dm_tran_locks — Locks hiện tại

-- Xem tất cả locks đang giữ
SELECT 
    l.resource_type,
    l.resource_database_id,
    l.resource_associated_entity_id,
    OBJECT_NAME(l.resource_associated_entity_id, l.resource_database_id) AS object_name,
    l.request_mode,   -- S, X, U, IS, IX, SIX, Sch-S, Sch-M
    l.request_status, -- GRANT, WAIT, CONVERT
    l.request_session_id,
    l.request_owner_type
FROM sys.dm_tran_locks l
WHERE l.resource_database_id = DB_ID()
    AND l.resource_type != 'DATABASE'
ORDER BY l.request_session_id, l.request_mode;

-- Xem deadlock information từ system_health XE session
SELECT 
    xdr.value('@timestamp', 'DATETIME2') AS deadlock_time,
    xdr.query('.') AS deadlock_graph
FROM (
    SELECT CAST(target_data AS XML) AS target_data
    FROM sys.dm_xe_session_targets t
    JOIN sys.dm_xe_sessions s ON t.event_session_address = s.address
    WHERE s.name = N'system_health'
      AND t.target_name = N'ring_buffer'
) AS data
CROSS APPLY target_data.nodes('//RingBufferTarget/event[@name="xml_deadlock_report"]') AS XEventData(xdr)
ORDER BY deadlock_time DESC;

7. sys.dm_io_virtual_file_stats — I/O per File

-- I/O statistics per database file
SELECT 
    DB_NAME(fs.database_id) AS database_name,
    mf.name AS file_name,
    mf.physical_name,
    mf.type_desc,
    fs.io_stall_read_ms,
    fs.num_of_reads,
    CASE fs.num_of_reads 
        WHEN 0 THEN 0 
        ELSE fs.io_stall_read_ms / fs.num_of_reads 
    END AS avg_read_ms,
    fs.io_stall_write_ms,
    fs.num_of_writes,
    CASE fs.num_of_writes 
        WHEN 0 THEN 0 
        ELSE fs.io_stall_write_ms / fs.num_of_writes 
    END AS avg_write_ms,
    fs.io_stall,
    fs.num_of_bytes_read / 1024 / 1024 AS mb_read,
    fs.num_of_bytes_written / 1024 / 1024 AS mb_written
FROM sys.dm_io_virtual_file_stats(NULL, NULL) fs
JOIN sys.master_files mf ON fs.database_id = mf.database_id 
    AND fs.file_id = mf.file_id
ORDER BY fs.io_stall DESC;

Rule of thumb: Avg read latency > 20ms → I/O bottleneck; > 50ms → nghiêm trọng.


8. sys.dm_db_index_usage_stats — Index Usage

-- Indexes ít được dùng (candidates để drop)
SELECT 
    OBJECT_NAME(ius.object_id) AS table_name,
    i.name AS index_name,
    i.type_desc,
    ius.user_seeks,
    ius.user_scans,
    ius.user_lookups,
    ius.user_updates,  -- Maintenance cost
    ius.last_user_seek,
    ius.last_user_scan
FROM sys.dm_db_index_usage_stats ius
JOIN sys.indexes i ON ius.object_id = i.object_id 
    AND ius.index_id = i.index_id
WHERE ius.database_id = DB_ID()
    AND i.type > 0  -- Không phải heap
    AND ius.user_seeks + ius.user_scans + ius.user_lookups = 0  -- Không được dùng
    AND ius.user_updates > 0  -- Nhưng vẫn tốn maintenance cost
ORDER BY ius.user_updates DESC;

-- Index được dùng nhiều nhất
SELECT TOP 20
    OBJECT_NAME(ius.object_id) AS table_name,
    i.name AS index_name,
    ius.user_seeks + ius.user_scans + ius.user_lookups AS total_reads,
    ius.user_updates AS total_writes,
    CAST(100.0 * ius.user_seeks / 
        NULLIF(ius.user_seeks + ius.user_scans + ius.user_lookups, 0) 
        AS DECIMAL(5,2)) AS seek_pct
FROM sys.dm_db_index_usage_stats ius
JOIN sys.indexes i ON ius.object_id = i.object_id 
    AND ius.index_id = i.index_id
WHERE ius.database_id = DB_ID()
ORDER BY total_reads DESC;

9. sys.dm_db_missing_index_details — Missing Index Suggestions

-- Missing indexes được SQL Server đề xuất
SELECT TOP 20
    DB_NAME(mid.database_id) AS database_name,
    OBJECT_NAME(mid.object_id, mid.database_id) AS table_name,
    mig.avg_user_impact,  -- % improvement estimate
    mig.user_seeks,
    mig.user_scans,
    mig.avg_total_user_cost,  -- Average query cost without index
    mid.equality_columns,
    mid.inequality_columns,
    mid.included_columns,
    -- Câu lệnh tạo index gợi ý
    'CREATE NONCLUSTERED INDEX IX_' + 
        OBJECT_NAME(mid.object_id, mid.database_id) + '_' +
        REPLACE(REPLACE(ISNULL(mid.equality_columns, '') + 
            ISNULL(mid.inequality_columns, ''), '[', ''), ']', '') +
    ' ON ' + mid.statement + 
    ' (' + ISNULL(mid.equality_columns, '') + 
        CASE WHEN mid.inequality_columns IS NOT NULL THEN 
            CASE WHEN mid.equality_columns IS NOT NULL THEN ', ' ELSE '' END + 
            mid.inequality_columns 
        ELSE '' END + ')' + 
    CASE WHEN mid.included_columns IS NOT NULL THEN 
        ' INCLUDE (' + mid.included_columns + ')' 
    ELSE '' END AS suggested_create_index
FROM sys.dm_db_missing_index_details mid
JOIN sys.dm_db_missing_index_groups mig ON mid.index_handle = mig.index_handle
JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
WHERE mid.database_id = DB_ID()
ORDER BY mig.avg_user_impact * mig.user_seeks DESC;

Cảnh báo: Đây chỉ là gợi ý — KHÔNG tạo tất cả missing indexes đề xuất. Validate với workload thực tế.


Query Store (SQL Server 2016+)

Query Store tự động collect query plans, runtime stats — không cần traces.

Bật Query Store

-- Enable Query Store
ALTER DATABASE ProductionDB
SET QUERY_STORE = ON (
    OPERATION_MODE = READ_WRITE,
    CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 30),
    DATA_FLUSH_INTERVAL_SECONDS = 900,
    INTERVAL_LENGTH_MINUTES = 60,
    MAX_STORAGE_SIZE_MB = 1000,
    QUERY_CAPTURE_MODE = AUTO,  -- AUTO = chỉ capture queries đáng kể
    SIZE_BASED_CLEANUP_MODE = AUTO
);

-- Xem trạng thái
SELECT * FROM sys.database_query_store_options;

Phát hiện Plan Regression

-- Tìm queries có plan regression (plan mới chậm hơn plan cũ)
WITH PlanStats AS (
    SELECT 
        q.query_id,
        qt.query_sql_text,
        p.plan_id,
        rs.avg_duration,
        rs.avg_cpu_time,
        rs.avg_logical_io_reads,
        rs.count_executions,
        rs.first_execution_time,
        ROW_NUMBER() OVER (PARTITION BY q.query_id 
                           ORDER BY rs.avg_duration) AS plan_rank
    FROM sys.query_store_query q
    JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
    JOIN sys.query_store_plan p ON q.query_id = p.query_id
    JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
    JOIN sys.query_store_runtime_stats_interval rsi 
        ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
    WHERE rsi.start_time >= DATEADD(HOUR, -24, GETUTCDATE())
)
SELECT 
    ps_fast.query_id,
    SUBSTRING(ps_fast.query_sql_text, 1, 100) AS query_snippet,
    ps_fast.plan_id AS fast_plan_id,
    ps_fast.avg_duration AS fast_plan_avg_duration_us,
    ps_slow.plan_id AS current_plan_id,
    ps_slow.avg_duration AS current_plan_avg_duration_us,
    CAST(ps_slow.avg_duration / NULLIF(ps_fast.avg_duration, 0) AS DECIMAL(10,2)) AS regression_ratio
FROM PlanStats ps_fast
JOIN PlanStats ps_slow ON ps_fast.query_id = ps_slow.query_id
WHERE ps_fast.plan_rank = 1
    AND ps_slow.plan_rank > 1
    AND ps_slow.avg_duration > ps_fast.avg_duration * 2  -- 2x slower = regression
ORDER BY regression_ratio DESC;

-- Force một plan cụ thể
EXEC sys.sp_query_store_force_plan @query_id = 42, @plan_id = 7;

-- Unforce plan
EXEC sys.sp_query_store_unforce_plan @query_id = 42, @plan_id = 7;

-- Top consuming queries trong Query Store
SELECT TOP 10
    q.query_id,
    SUBSTRING(qt.query_sql_text, 1, 100) AS query_snippet,
    SUM(rs.avg_duration * rs.count_executions) AS total_duration_us,
    SUM(rs.count_executions) AS total_executions,
    SUM(rs.avg_cpu_time * rs.count_executions) AS total_cpu_us
FROM sys.query_store_query q
JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_plan p ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
GROUP BY q.query_id, qt.query_sql_text
ORDER BY total_duration_us DESC;

Extended Events

Extended Events (XE) là hệ thống tracing nhẹ, thay thế SQL Server Profiler.

-- Tạo XE session để capture slow queries
CREATE EVENT SESSION [CaptureSlowQueries] ON SERVER
ADD EVENT sqlserver.sql_statement_completed(
    WHERE (
        [duration] > 1000000  -- > 1 giây (microseconds)
        AND [cpu_time] > 100000  -- > 100ms CPU
    )
    ACTION(
        sqlserver.sql_text,
        sqlserver.query_hash,
        sqlserver.query_plan_hash,
        sqlserver.client_app_name,
        sqlserver.username,
        sqlserver.database_name
    )
),
ADD EVENT sqlserver.rpc_completed(
    WHERE ([duration] > 1000000)
    ACTION(sqlserver.sql_text, sqlserver.database_name)
)
ADD TARGET package0.event_file(
    SET filename = N'D:\XEvents\SlowQueries.xel',
    max_file_size = 500,   -- 500 MB per file
    max_rollover_files = 5  -- Keep last 5 files
)
WITH (
    MAX_MEMORY = 50 MB,
    EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
    MAX_DISPATCH_LATENCY = 5 SECONDS,
    MAX_EVENT_SIZE = 0 KB,
    MEMORY_PARTITION_MODE = NONE,
    TRACK_CAUSALITY = OFF,
    STARTUP_STATE = OFF
);

-- Start session
ALTER EVENT SESSION [CaptureSlowQueries] ON SERVER STATE = START;

-- Tạo session để capture deadlocks (thay thế cho Profiler trace)
CREATE EVENT SESSION [CaptureDeadlocks] ON SERVER
ADD EVENT sqlserver.xml_deadlock_report
ADD TARGET package0.ring_buffer(SET max_memory = 51200)
WITH (STARTUP_STATE = ON);  -- Auto-start với SQL Server

ALTER EVENT SESSION [CaptureDeadlocks] ON SERVER STATE = START;

-- Đọc deadlock data từ ring buffer
SELECT 
    xdr.value('@timestamp', 'DATETIME2') AS deadlock_time,
    xdr.query('.') AS deadlock_xml
FROM (
    SELECT CAST(target_data AS XML) AS target_data
    FROM sys.dm_xe_session_targets t
    JOIN sys.dm_xe_sessions s ON t.event_session_address = s.address
    WHERE s.name = N'CaptureDeadlocks'
      AND t.target_name = N'ring_buffer'
) AS data
CROSS APPLY target_data.nodes('//RingBufferTarget/event[@name="xml_deadlock_report"]') XEventData(xdr)
ORDER BY deadlock_time DESC;

-- Đọc XE file
SELECT 
    event_data.value('(event/@name)[1]', 'NVARCHAR(50)') AS event_name,
    event_data.value('(event/@timestamp)[1]', 'DATETIME2') AS event_time,
    event_data.value('(event/data[@name="duration"]/value)[1]', 'BIGINT') / 1000 AS duration_ms,
    event_data.value('(event/action[@name="sql_text"]/value)[1]', 'NVARCHAR(MAX)') AS sql_text
FROM (
    SELECT CAST(event_data AS XML) AS event_data
    FROM sys.fn_xe_file_target_read_file(
        'D:\XEvents\SlowQueries*.xel', NULL, NULL, NULL)
) AS XEData
ORDER BY event_time DESC;

-- Stop và drop session
ALTER EVENT SESSION [CaptureSlowQueries] ON SERVER STATE = STOP;
DROP EVENT SESSION [CaptureSlowQueries] ON SERVER;

SQL Server Error Log

-- Đọc error log hiện tại
EXEC xp_readerrorlog 0, 1;  -- 0=current log, 1=error log (2=agent log)

-- Filter by keyword và time range
EXEC xp_readerrorlog 
    0,           -- Log file number (0=current)
    1,           -- Log type (1=SQL, 2=Agent)
    N'Error',    -- Search string 1
    NULL,        -- Search string 2
    '2026-04-01 00:00:00',  -- Start time
    '2026-04-01 23:59:59',  -- End time
    N'DESC';     -- Sort order

-- Cycle error log (tạo log mới, archive cũ)
EXEC sp_cycle_errorlog;

-- Xem số lượng error logs được giữ
EXEC xp_instance_regread 
    N'HKEY_LOCAL_MACHINE',
    N'Software\Microsoft\MSSQLServer\MSSQLServer',
    N'NumErrorLogs';
-- Config qua SSMS: Management > SQL Server Logs > Configure

Windows Performance Monitor Counters

Key SQL Server PerfMon counters cần monitor:

CounterNormalCảnh báoNghiêm trọng
Buffer Cache Hit Ratio> 99%95-99%< 95%
Page Life Expectancy> 300s (tốt hơn > 1000s)200-300s< 200s
Batch Requests/secBaseline20% trên baseline50% trên baseline
SQL Compilations/sec< 10% của Batch Req10-20%> 20% (thiếu reuse/parameterization)
SQL Re-Compilations/sec< 10% của Compilations
Lock Waits/sec0 ideally> 0 thường xuyên = blocking
Deadlocks/sec0AnyNhiều = design issue
Checkpoint Pages/secVariesCao liên tục = dirty page pressure
Lazy Writes/sec0> 0 = memory pressure
-- Lấy PLE qua DMV (không cần PerfMon external)
SELECT 
    object_name,
    counter_name,
    instance_name,
    cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name IN (
    'Page life expectancy',
    'Batch Requests/sec',
    'SQL Compilations/sec',
    'SQL Re-Compilations/sec',
    'Lock Waits/sec',
    'Full Scans/sec',
    'Index Searches/sec',
    'Deadlocks/sec',
    'Buffer cache hit ratio'
)
ORDER BY object_name, counter_name;

Activity Monitor

Activity Monitor trong SSMS (Right-click server → Activity Monitor) cung cấp:

  • Overview: CPU, waits, I/O, batch requests/sec (real-time chart)
  • Processes: active sessions, blocking chains
  • Resource Waits: wait types heatmap
  • Data File I/O: read/write per file
  • Recent Expensive Queries: top queries hiện tại

Lưu ý: Activity Monitor có overhead nhất định, không nên để mở liên tục trên production.


Thiết lập Performance Baseline

-- Script thu thập baseline (chạy theo lịch để trend analysis)
CREATE TABLE dbo.PerformanceBaseline (
    CaptureTime     DATETIME2 DEFAULT SYSDATETIME(),
    CounterName     NVARCHAR(128),
    InstanceName    NVARCHAR(128),
    CounterValue    BIGINT
);

-- Insert snapshot
INSERT INTO dbo.PerformanceBaseline (CounterName, InstanceName, CounterValue)
SELECT counter_name, instance_name, cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name IN (
    'Page life expectancy', 
    'Batch Requests/sec',
    'SQL Compilations/sec'
);

-- Wait stats snapshot
CREATE TABLE dbo.WaitStatsBaseline (
    CaptureTime     DATETIME2 DEFAULT SYSDATETIME(),
    WaitType        NVARCHAR(60),
    WaitTimeMs      BIGINT,
    WaitingTasks    BIGINT
);

INSERT INTO dbo.WaitStatsBaseline (WaitType, WaitTimeMs, WaitingTasks)
SELECT wait_type, wait_time_ms, waiting_tasks_count
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN ('SLEEP_TASK', 'LAZYWRITER_SLEEP', 'WAITFOR');

-- So sánh 2 snapshots để tính delta
SELECT 
    curr.WaitType,
    curr.WaitTimeMs - prev.WaitTimeMs AS delta_wait_ms,
    curr.WaitingTasks - prev.WaitingTasks AS delta_tasks
FROM dbo.WaitStatsBaseline curr
JOIN dbo.WaitStatsBaseline prev ON curr.WaitType = prev.WaitType
WHERE curr.CaptureTime = '2026-04-01 12:00:00'
    AND prev.CaptureTime = '2026-04-01 11:00:00'
ORDER BY delta_wait_ms DESC;

Q&A - Phỏng vấn Monitoring & Diagnostics

Junior Level

Q1: DMV là gì? Tại sao quan trọng?

Dynamic Management Views (DMVs) là các views hệ thống trong SQL Server cung cấp thông tin runtime về server health, query performance, waits, locks. Quan trọng vì:

  • Real-time diagnostic mà không cần external tools
  • Phát hiện blocking, long-running queries, missing indexes
  • Không cần đặc quyền cao (một số DMV chỉ cần VIEW SERVER STATE)
  • Dữ liệu accumulate từ khi SQL Server restart (trừ một số reset khi plan cache clear)

Q2: Làm thế nào để tìm query đang blocking trong SQL Server?

SELECT r.session_id, r.blocking_session_id, t.text
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.blocking_session_id > 0;

Sau đó dùng KILL session_id để giải phóng nếu thực sự cần.

Q3: Sự khác biệt giữa Extended Events và SQL Server Profiler?

Extended EventsSQL Server Profiler
Performance overheadRất thấpCao (không dùng trên production)
GranularityRất chi tiếtHạn chế hơn
Storage targetsRing buffer, file, ETWFile, table
Real-time
StatusHiện tại (recommended)Legacy (deprecated)

Mid Level

Q4: Giải thích wait statistics. Làm thế nào dùng để chẩn đoán bottleneck?

Khi SQL Server cần tài nguyên nhưng không available (CPU, I/O, lock, memory), task sẽ wait và ghi vào wait stats. Analyze top wait types để biết bottleneck:

  • PAGEIOLATCH_*: data pages đang đọc từ disk → thêm RAM hoặc index
  • LCK_M_*: blocking → tìm blocker, tối ưu transactions
  • CXPACKET: parallel queries đang sync → điều chỉnh MAXDOP
  • RESOURCE_SEMAPHORE: sort/hash cần memory grant → tối ưu queries, thêm RAM

Q5: Query Store khác gì với sys.dm_exec_query_stats?

Query Storesys.dm_exec_query_stats
PersistencePersisted trong databaseLost khi plan evicted from cache
HistoryConfigurable (days/weeks)Chỉ plans còn trong cache
Plan comparisonCó thể compare nhiều plans per queryMột plan per entry
Force planKhông
OverheadNhỏNone (passive)

Q6: Page Life Expectancy (PLE) là gì? Giá trị bao nhiêu là tốt?

PLE đo số giây trung bình một data page ở trong buffer pool trước khi bị evict. PLE thấp = pages thường xuyên phải đọc lại từ disk → I/O pressure.

  • Traditional threshold: > 300 giây
  • Modern guidance: baseline thường > 1000 giây; cảnh báo khi drop > 50% so với baseline
  • Với nhiều NUMA nodes: PLE per NUMA node quan trọng hơn global PLE

Senior Level

Q7: Xây dựng monitoring strategy cho SQL Server production. Những gì cần monitor và alert?

Real-time alerts (immediate response):

  • Deadlocks/sec > 0 → XE capture + notify DBA
  • Blocking duration > 5 phút → auto-kill hoặc notify
  • CPU > 90% trong 5+ phút
  • Error log: severity 17+ errors

Trending metrics (collect baseline, alert on deviation):

  • PLE trend (daily average, alert if < 300s)
  • Wait stats delta (hourly snapshot comparison)
  • I/O latency per file (> 20ms read = investigate)
  • Disk free space (< 20% = warning, < 10% = critical)
  • Log space usage (per database)
  • TempDB space usage

Capacity planning (weekly/monthly):

  • Database growth rate
  • Index fragmentation trends
  • Missing index opportunities
  • Top growing tables

Tools: SQL Server Central Monitoring Repository (custom), Ola Hallengren + alerts, third-party (SolarWinds DPA, SentryOne/SQL Sentry, Redgate SQL Monitor)

Q8: Làm thế nào để diagnose một query đột ngột chậm hơn mà không reproduce được?

  1. Query Store (nếu đang bật): tìm query_id, so sánh plans, elapsed time trend theo thời gian. Nếu plan regression → force old plan
  2. sys.dm_exec_query_stats: xem plan_generation_num (số lần recompile), execution count, avg duration
  3. Extended Events: capture query_post_execution_showplan để xem actual plan
  4. Wait stats per query: dùng SET STATISTICS IO, TIME ON để reproduce
  5. Parameter sniffing: thử OPTION (RECOMPILE) hoặc OPTION (OPTIMIZE FOR UNKNOWN)
  6. Statistics: kiểm tra last_updated của statistics trên các tables liên quan — outdated stats → bad estimates → bad plan
  7. Schema changes: ai đó drop/create index, thay đổi table structure

Q9: Explain tại sao CXPACKET wait cao không nhất thiết là vấn đề. Phân biệt CXPACKET “healthy” và “unhealthy”?

CXPACKET là wait type khi các threads trong parallel query plan đang sync (exchange operators). SQL Server phải balance work giữa các threads.

“Healthy” CXPACKET:

  • Parallel query đang chạy nhanh, threads sync là bình thường
  • CXPACKET is high nhưng query duration ngắn

“Unhealthy” CXPACKET:

  • Thread skew: một thread xử lý 90% work, các thread khác chờ → CXPACKET cao + query chậm
  • Thường kèm SOS_SCHEDULER_YIELD (CPU pressure)
  • Nguyên nhân: data skew, bad statistics, non-sargable predicates

Giải pháp:

  • Điều chỉnh Cost Threshold for Parallelism (mặc định 5, thường tăng lên 50-75)
  • MAXDOP per query: OPTION (MAXDOP 4) để limit parallelism
  • Fix statistics và data skew
  • SQL Server 2019+: Intelligent Query Processing tự adjust parallelism

Từ SQL Server 2016 SP2+: CXCONSUMER (idle threads waiting) thay thế cho phần idle của CXPACKET, giúp phân tích chính xác hơn.