Monitoring & Diagnostics
Dynamic Management Views (DMVs)
DMV là các views và functions hệ thống cung cấp thông tin runtime về SQL Server — không cần third-party tools.
Phân loại DMVs
sys.dm_exec_*: query execution, sessions, requestssys.dm_os_*: operating system, memory, waits, schedulerssys.dm_tran_*: transactions và lockssys.dm_io_*: I/Osys.dm_db_*: database-level metricssys.dm_hadr_*: Always On AG health
1. sys.dm_exec_requests — Queries đang chạy
-- Xem tất cả queries đang thực thi
SELECT
r.session_id,
r.status,
r.start_time,
r.command,
DATEDIFF(SECOND, r.start_time, GETDATE()) AS duration_seconds,
r.cpu_time,
r.reads,
r.writes,
r.logical_reads,
r.wait_type,
r.wait_time,
r.blocking_session_id,
DB_NAME(r.database_id) AS database_name,
SUBSTRING(t.text,
(r.statement_start_offset / 2) + 1,
CASE r.statement_end_offset
WHEN -1 THEN DATALENGTH(t.text)
ELSE r.statement_end_offset
END - r.statement_start_offset) / 2 + 1) AS current_statement,
t.text AS full_query,
qp.query_plan
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
CROSS APPLY sys.dm_exec_query_plan(r.plan_handle) qp
WHERE r.session_id > 50 -- Exclude system sessions
AND r.status != 'background'
ORDER BY r.cpu_time DESC;
-- Tìm query đang bị block
SELECT
r.session_id,
r.blocking_session_id,
DATEDIFF(SECOND, r.start_time, GETDATE()) AS blocked_for_seconds,
t.text AS blocked_query,
r.wait_type
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.blocking_session_id > 0;
2. sys.dm_exec_sessions — Connected Sessions
-- Xem tất cả sessions đang kết nối
SELECT
s.session_id,
s.login_name,
s.host_name,
s.program_name,
s.status,
s.cpu_time,
s.memory_usage * 8 AS memory_kb,
s.total_elapsed_time / 1000 AS elapsed_seconds,
s.last_request_start_time,
s.reads,
s.writes,
s.logical_reads,
DB_NAME(s.database_id) AS current_database,
s.open_transaction_count
FROM sys.dm_exec_sessions s
WHERE s.is_user_process = 1 -- Chỉ user sessions, không phải system
ORDER BY s.cpu_time DESC;
-- Xem sessions với open transactions lâu (potential blocking)
SELECT
s.session_id,
s.login_name,
s.open_transaction_count,
DATEDIFF(MINUTE, s.last_request_start_time, GETDATE()) AS idle_minutes,
t.text AS last_query
FROM sys.dm_exec_sessions s
CROSS APPLY sys.dm_exec_sql_text(s.most_recent_sql_handle) t
WHERE s.open_transaction_count > 0
AND s.is_user_process = 1
ORDER BY idle_minutes DESC;
3. sys.dm_exec_query_stats — Query Statistics từ Cache
-- Top 10 queries tiêu tốn CPU nhiều nhất
SELECT TOP 10
qs.execution_count,
qs.total_worker_time / 1000 AS total_cpu_ms,
qs.total_worker_time / qs.execution_count / 1000 AS avg_cpu_ms,
qs.total_logical_reads,
qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
qs.total_elapsed_time / 1000 AS total_elapsed_ms,
qs.total_elapsed_time / qs.execution_count / 1000 AS avg_elapsed_ms,
qs.creation_time AS plan_created_time,
SUBSTRING(t.text,
(qs.statement_start_offset / 2) + 1,
(ISNULL(qs.statement_end_offset, DATALENGTH(t.text)) -
qs.statement_start_offset) / 2 + 1) AS query_text,
qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) t
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY qs.total_worker_time DESC;
-- Top 10 queries tốn I/O (logical reads)
SELECT TOP 10
qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
qs.execution_count,
qs.total_logical_reads,
SUBSTRING(t.text, 1, 200) AS query_snippet
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) t
ORDER BY avg_logical_reads DESC;
4. sys.dm_os_wait_stats — Wait Statistics
Wait statistics là cách tốt nhất để chẩn đoán bottleneck.
-- Top wait types (loại trừ idle waits)
SELECT TOP 20
wait_type,
wait_time_ms / 1000.0 AS wait_time_seconds,
waiting_tasks_count,
wait_time_ms / NULLIF(waiting_tasks_count, 0) AS avg_wait_ms,
CAST(wait_time_ms * 100.0 / SUM(wait_time_ms) OVER() AS DECIMAL(5,2)) AS pct_total
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (
-- Idle/benign waits - loại trừ
'SLEEP_TASK', 'SLEEP_SYSTEMTASK', 'BROKER_TO_FLUSH',
'BROKER_TASK_STOP', 'CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT',
'DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
'HADR_FILESTREAM_IOMGR_IOCOMPLETION', 'HADR_WORK_QUEUE',
'LAZYWRITER_SLEEP', 'LOGMGR_QUEUE', 'ONDEMAND_TASK_QUEUE',
'REQUEST_FOR_DEADLOCK_SEARCH', 'RESOURCE_QUEUE',
'SERVER_IDLE_CHECK', 'SLEEP_DBSTARTUP', 'SLEEP_DCOMSTARTUP',
'SLEEP_MASTERDBREADY', 'SLEEP_MASTERMDREADY', 'SLEEP_MASTERUPGRADED',
'SLEEP_MSDBSTARTUP', 'SLEEP_TEMPDBSTARTUP', 'SNI_HTTP_ACCEPT',
'SP_SERVER_DIAGNOSTICS_SLEEP', 'SQLTRACE_BUFFER_FLUSH',
'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', 'WAITFOR',
'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', 'XE_DISPATCHER_WAIT',
'XE_TIMER_EVENT', 'BROKER_EVENTHANDLER', 'CHECKPOINT_QUEUE',
'DBMIRROR_EVENTS_QUEUE', 'SQLTRACE_WAIT_ENTRIES',
'WAIT_XTP_CKPT_CLOSE', 'XE_DISPATCHER_JOIN'
)
ORDER BY wait_time_ms DESC;
-- Reset wait stats (để measure từ đầu)
DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);
Phân tích các Wait Types phổ biến
| Wait Type | Ý nghĩa | Giải pháp |
|---|---|---|
PAGEIOLATCH_SH/EX | Đọc/ghi trang từ disk | Thiếu RAM, I/O bottleneck, thiếu index |
LCK_M_X, LCK_M_S | Chờ lock (blocking) | Blocking chains, long transactions, thiếu index |
CXPACKET | Parallel query waits | Điều chỉnh MAXDOP, Cost Threshold for Parallelism |
SOS_SCHEDULER_YIELD | CPU pressure | CPU overload, nhiều CPU-intensive queries |
WRITELOG | Chờ log flush | Slow disk (log file), high write workload |
ASYNC_NETWORK_IO | Client đọc dữ liệu chậm | ứng dụng xử lý kết quả chậm, large result sets |
RESOURCE_SEMAPHORE | Chờ memory grant | Memory pressure, sort/hash operations lớn |
THREADPOOL | Không đủ workers | CPU pressure, too many connections |
PAGELATCH_* | In-memory page contention | Hotspot pages (GAM, PFS, identity columns) |
5. sys.dm_os_waiting_tasks — Tasks đang chờ hiện tại
-- Xem blocking chains chi tiết
WITH BlockingChain AS (
SELECT
wt.waiting_task_address,
wt.session_id,
wt.wait_type,
wt.wait_duration_ms,
wt.blocking_session_id,
wt.resource_description,
es.login_name,
SUBSTRING(t.text, 1, 100) AS query_snippet
FROM sys.dm_os_waiting_tasks wt
JOIN sys.dm_exec_sessions es ON wt.session_id = es.session_id
CROSS APPLY sys.dm_exec_sql_text(es.most_recent_sql_handle) t
WHERE wt.blocking_session_id IS NOT NULL
)
SELECT
session_id AS blocked_session,
blocking_session_id AS blocker,
wait_type,
wait_duration_ms / 1000.0 AS wait_seconds,
login_name,
query_snippet
FROM BlockingChain
ORDER BY wait_duration_ms DESC;
6. sys.dm_tran_locks — Locks hiện tại
-- Xem tất cả locks đang giữ
SELECT
l.resource_type,
l.resource_database_id,
l.resource_associated_entity_id,
OBJECT_NAME(l.resource_associated_entity_id, l.resource_database_id) AS object_name,
l.request_mode, -- S, X, U, IS, IX, SIX, Sch-S, Sch-M
l.request_status, -- GRANT, WAIT, CONVERT
l.request_session_id,
l.request_owner_type
FROM sys.dm_tran_locks l
WHERE l.resource_database_id = DB_ID()
AND l.resource_type != 'DATABASE'
ORDER BY l.request_session_id, l.request_mode;
-- Xem deadlock information từ system_health XE session
SELECT
xdr.value('@timestamp', 'DATETIME2') AS deadlock_time,
xdr.query('.') AS deadlock_graph
FROM (
SELECT CAST(target_data AS XML) AS target_data
FROM sys.dm_xe_session_targets t
JOIN sys.dm_xe_sessions s ON t.event_session_address = s.address
WHERE s.name = N'system_health'
AND t.target_name = N'ring_buffer'
) AS data
CROSS APPLY target_data.nodes('//RingBufferTarget/event[@name="xml_deadlock_report"]') AS XEventData(xdr)
ORDER BY deadlock_time DESC;
7. sys.dm_io_virtual_file_stats — I/O per File
-- I/O statistics per database file
SELECT
DB_NAME(fs.database_id) AS database_name,
mf.name AS file_name,
mf.physical_name,
mf.type_desc,
fs.io_stall_read_ms,
fs.num_of_reads,
CASE fs.num_of_reads
WHEN 0 THEN 0
ELSE fs.io_stall_read_ms / fs.num_of_reads
END AS avg_read_ms,
fs.io_stall_write_ms,
fs.num_of_writes,
CASE fs.num_of_writes
WHEN 0 THEN 0
ELSE fs.io_stall_write_ms / fs.num_of_writes
END AS avg_write_ms,
fs.io_stall,
fs.num_of_bytes_read / 1024 / 1024 AS mb_read,
fs.num_of_bytes_written / 1024 / 1024 AS mb_written
FROM sys.dm_io_virtual_file_stats(NULL, NULL) fs
JOIN sys.master_files mf ON fs.database_id = mf.database_id
AND fs.file_id = mf.file_id
ORDER BY fs.io_stall DESC;
Rule of thumb: Avg read latency > 20ms → I/O bottleneck; > 50ms → nghiêm trọng.
8. sys.dm_db_index_usage_stats — Index Usage
-- Indexes ít được dùng (candidates để drop)
SELECT
OBJECT_NAME(ius.object_id) AS table_name,
i.name AS index_name,
i.type_desc,
ius.user_seeks,
ius.user_scans,
ius.user_lookups,
ius.user_updates, -- Maintenance cost
ius.last_user_seek,
ius.last_user_scan
FROM sys.dm_db_index_usage_stats ius
JOIN sys.indexes i ON ius.object_id = i.object_id
AND ius.index_id = i.index_id
WHERE ius.database_id = DB_ID()
AND i.type > 0 -- Không phải heap
AND ius.user_seeks + ius.user_scans + ius.user_lookups = 0 -- Không được dùng
AND ius.user_updates > 0 -- Nhưng vẫn tốn maintenance cost
ORDER BY ius.user_updates DESC;
-- Index được dùng nhiều nhất
SELECT TOP 20
OBJECT_NAME(ius.object_id) AS table_name,
i.name AS index_name,
ius.user_seeks + ius.user_scans + ius.user_lookups AS total_reads,
ius.user_updates AS total_writes,
CAST(100.0 * ius.user_seeks /
NULLIF(ius.user_seeks + ius.user_scans + ius.user_lookups, 0)
AS DECIMAL(5,2)) AS seek_pct
FROM sys.dm_db_index_usage_stats ius
JOIN sys.indexes i ON ius.object_id = i.object_id
AND ius.index_id = i.index_id
WHERE ius.database_id = DB_ID()
ORDER BY total_reads DESC;
9. sys.dm_db_missing_index_details — Missing Index Suggestions
-- Missing indexes được SQL Server đề xuất
SELECT TOP 20
DB_NAME(mid.database_id) AS database_name,
OBJECT_NAME(mid.object_id, mid.database_id) AS table_name,
mig.avg_user_impact, -- % improvement estimate
mig.user_seeks,
mig.user_scans,
mig.avg_total_user_cost, -- Average query cost without index
mid.equality_columns,
mid.inequality_columns,
mid.included_columns,
-- Câu lệnh tạo index gợi ý
'CREATE NONCLUSTERED INDEX IX_' +
OBJECT_NAME(mid.object_id, mid.database_id) + '_' +
REPLACE(REPLACE(ISNULL(mid.equality_columns, '') +
ISNULL(mid.inequality_columns, ''), '[', ''), ']', '') +
' ON ' + mid.statement +
' (' + ISNULL(mid.equality_columns, '') +
CASE WHEN mid.inequality_columns IS NOT NULL THEN
CASE WHEN mid.equality_columns IS NOT NULL THEN ', ' ELSE '' END +
mid.inequality_columns
ELSE '' END + ')' +
CASE WHEN mid.included_columns IS NOT NULL THEN
' INCLUDE (' + mid.included_columns + ')'
ELSE '' END AS suggested_create_index
FROM sys.dm_db_missing_index_details mid
JOIN sys.dm_db_missing_index_groups mig ON mid.index_handle = mig.index_handle
JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
WHERE mid.database_id = DB_ID()
ORDER BY mig.avg_user_impact * mig.user_seeks DESC;
Cảnh báo: Đây chỉ là gợi ý — KHÔNG tạo tất cả missing indexes đề xuất. Validate với workload thực tế.
Query Store (SQL Server 2016+)
Query Store tự động collect query plans, runtime stats — không cần traces.
Bật Query Store
-- Enable Query Store
ALTER DATABASE ProductionDB
SET QUERY_STORE = ON (
OPERATION_MODE = READ_WRITE,
CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 30),
DATA_FLUSH_INTERVAL_SECONDS = 900,
INTERVAL_LENGTH_MINUTES = 60,
MAX_STORAGE_SIZE_MB = 1000,
QUERY_CAPTURE_MODE = AUTO, -- AUTO = chỉ capture queries đáng kể
SIZE_BASED_CLEANUP_MODE = AUTO
);
-- Xem trạng thái
SELECT * FROM sys.database_query_store_options;
Phát hiện Plan Regression
-- Tìm queries có plan regression (plan mới chậm hơn plan cũ)
WITH PlanStats AS (
SELECT
q.query_id,
qt.query_sql_text,
p.plan_id,
rs.avg_duration,
rs.avg_cpu_time,
rs.avg_logical_io_reads,
rs.count_executions,
rs.first_execution_time,
ROW_NUMBER() OVER (PARTITION BY q.query_id
ORDER BY rs.avg_duration) AS plan_rank
FROM sys.query_store_query q
JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_plan p ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
JOIN sys.query_store_runtime_stats_interval rsi
ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
WHERE rsi.start_time >= DATEADD(HOUR, -24, GETUTCDATE())
)
SELECT
ps_fast.query_id,
SUBSTRING(ps_fast.query_sql_text, 1, 100) AS query_snippet,
ps_fast.plan_id AS fast_plan_id,
ps_fast.avg_duration AS fast_plan_avg_duration_us,
ps_slow.plan_id AS current_plan_id,
ps_slow.avg_duration AS current_plan_avg_duration_us,
CAST(ps_slow.avg_duration / NULLIF(ps_fast.avg_duration, 0) AS DECIMAL(10,2)) AS regression_ratio
FROM PlanStats ps_fast
JOIN PlanStats ps_slow ON ps_fast.query_id = ps_slow.query_id
WHERE ps_fast.plan_rank = 1
AND ps_slow.plan_rank > 1
AND ps_slow.avg_duration > ps_fast.avg_duration * 2 -- 2x slower = regression
ORDER BY regression_ratio DESC;
-- Force một plan cụ thể
EXEC sys.sp_query_store_force_plan @query_id = 42, @plan_id = 7;
-- Unforce plan
EXEC sys.sp_query_store_unforce_plan @query_id = 42, @plan_id = 7;
-- Top consuming queries trong Query Store
SELECT TOP 10
q.query_id,
SUBSTRING(qt.query_sql_text, 1, 100) AS query_snippet,
SUM(rs.avg_duration * rs.count_executions) AS total_duration_us,
SUM(rs.count_executions) AS total_executions,
SUM(rs.avg_cpu_time * rs.count_executions) AS total_cpu_us
FROM sys.query_store_query q
JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_plan p ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
GROUP BY q.query_id, qt.query_sql_text
ORDER BY total_duration_us DESC;
Extended Events
Extended Events (XE) là hệ thống tracing nhẹ, thay thế SQL Server Profiler.
-- Tạo XE session để capture slow queries
CREATE EVENT SESSION [CaptureSlowQueries] ON SERVER
ADD EVENT sqlserver.sql_statement_completed(
WHERE (
[duration] > 1000000 -- > 1 giây (microseconds)
AND [cpu_time] > 100000 -- > 100ms CPU
)
ACTION(
sqlserver.sql_text,
sqlserver.query_hash,
sqlserver.query_plan_hash,
sqlserver.client_app_name,
sqlserver.username,
sqlserver.database_name
)
),
ADD EVENT sqlserver.rpc_completed(
WHERE ([duration] > 1000000)
ACTION(sqlserver.sql_text, sqlserver.database_name)
)
ADD TARGET package0.event_file(
SET filename = N'D:\XEvents\SlowQueries.xel',
max_file_size = 500, -- 500 MB per file
max_rollover_files = 5 -- Keep last 5 files
)
WITH (
MAX_MEMORY = 50 MB,
EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY = 5 SECONDS,
MAX_EVENT_SIZE = 0 KB,
MEMORY_PARTITION_MODE = NONE,
TRACK_CAUSALITY = OFF,
STARTUP_STATE = OFF
);
-- Start session
ALTER EVENT SESSION [CaptureSlowQueries] ON SERVER STATE = START;
-- Tạo session để capture deadlocks (thay thế cho Profiler trace)
CREATE EVENT SESSION [CaptureDeadlocks] ON SERVER
ADD EVENT sqlserver.xml_deadlock_report
ADD TARGET package0.ring_buffer(SET max_memory = 51200)
WITH (STARTUP_STATE = ON); -- Auto-start với SQL Server
ALTER EVENT SESSION [CaptureDeadlocks] ON SERVER STATE = START;
-- Đọc deadlock data từ ring buffer
SELECT
xdr.value('@timestamp', 'DATETIME2') AS deadlock_time,
xdr.query('.') AS deadlock_xml
FROM (
SELECT CAST(target_data AS XML) AS target_data
FROM sys.dm_xe_session_targets t
JOIN sys.dm_xe_sessions s ON t.event_session_address = s.address
WHERE s.name = N'CaptureDeadlocks'
AND t.target_name = N'ring_buffer'
) AS data
CROSS APPLY target_data.nodes('//RingBufferTarget/event[@name="xml_deadlock_report"]') XEventData(xdr)
ORDER BY deadlock_time DESC;
-- Đọc XE file
SELECT
event_data.value('(event/@name)[1]', 'NVARCHAR(50)') AS event_name,
event_data.value('(event/@timestamp)[1]', 'DATETIME2') AS event_time,
event_data.value('(event/data[@name="duration"]/value)[1]', 'BIGINT') / 1000 AS duration_ms,
event_data.value('(event/action[@name="sql_text"]/value)[1]', 'NVARCHAR(MAX)') AS sql_text
FROM (
SELECT CAST(event_data AS XML) AS event_data
FROM sys.fn_xe_file_target_read_file(
'D:\XEvents\SlowQueries*.xel', NULL, NULL, NULL)
) AS XEData
ORDER BY event_time DESC;
-- Stop và drop session
ALTER EVENT SESSION [CaptureSlowQueries] ON SERVER STATE = STOP;
DROP EVENT SESSION [CaptureSlowQueries] ON SERVER;
SQL Server Error Log
-- Đọc error log hiện tại
EXEC xp_readerrorlog 0, 1; -- 0=current log, 1=error log (2=agent log)
-- Filter by keyword và time range
EXEC xp_readerrorlog
0, -- Log file number (0=current)
1, -- Log type (1=SQL, 2=Agent)
N'Error', -- Search string 1
NULL, -- Search string 2
'2026-04-01 00:00:00', -- Start time
'2026-04-01 23:59:59', -- End time
N'DESC'; -- Sort order
-- Cycle error log (tạo log mới, archive cũ)
EXEC sp_cycle_errorlog;
-- Xem số lượng error logs được giữ
EXEC xp_instance_regread
N'HKEY_LOCAL_MACHINE',
N'Software\Microsoft\MSSQLServer\MSSQLServer',
N'NumErrorLogs';
-- Config qua SSMS: Management > SQL Server Logs > Configure
Windows Performance Monitor Counters
Key SQL Server PerfMon counters cần monitor:
| Counter | Normal | Cảnh báo | Nghiêm trọng |
|---|---|---|---|
| Buffer Cache Hit Ratio | > 99% | 95-99% | < 95% |
| Page Life Expectancy | > 300s (tốt hơn > 1000s) | 200-300s | < 200s |
| Batch Requests/sec | Baseline | 20% trên baseline | 50% trên baseline |
| SQL Compilations/sec | < 10% của Batch Req | 10-20% | > 20% (thiếu reuse/parameterization) |
| SQL Re-Compilations/sec | < 10% của Compilations | ||
| Lock Waits/sec | 0 ideally | > 0 thường xuyên = blocking | |
| Deadlocks/sec | 0 | Any | Nhiều = design issue |
| Checkpoint Pages/sec | Varies | Cao liên tục = dirty page pressure | |
| Lazy Writes/sec | 0 | > 0 = memory pressure |
-- Lấy PLE qua DMV (không cần PerfMon external)
SELECT
object_name,
counter_name,
instance_name,
cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name IN (
'Page life expectancy',
'Batch Requests/sec',
'SQL Compilations/sec',
'SQL Re-Compilations/sec',
'Lock Waits/sec',
'Full Scans/sec',
'Index Searches/sec',
'Deadlocks/sec',
'Buffer cache hit ratio'
)
ORDER BY object_name, counter_name;
Activity Monitor
Activity Monitor trong SSMS (Right-click server → Activity Monitor) cung cấp:
- Overview: CPU, waits, I/O, batch requests/sec (real-time chart)
- Processes: active sessions, blocking chains
- Resource Waits: wait types heatmap
- Data File I/O: read/write per file
- Recent Expensive Queries: top queries hiện tại
Lưu ý: Activity Monitor có overhead nhất định, không nên để mở liên tục trên production.
Thiết lập Performance Baseline
-- Script thu thập baseline (chạy theo lịch để trend analysis)
CREATE TABLE dbo.PerformanceBaseline (
CaptureTime DATETIME2 DEFAULT SYSDATETIME(),
CounterName NVARCHAR(128),
InstanceName NVARCHAR(128),
CounterValue BIGINT
);
-- Insert snapshot
INSERT INTO dbo.PerformanceBaseline (CounterName, InstanceName, CounterValue)
SELECT counter_name, instance_name, cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name IN (
'Page life expectancy',
'Batch Requests/sec',
'SQL Compilations/sec'
);
-- Wait stats snapshot
CREATE TABLE dbo.WaitStatsBaseline (
CaptureTime DATETIME2 DEFAULT SYSDATETIME(),
WaitType NVARCHAR(60),
WaitTimeMs BIGINT,
WaitingTasks BIGINT
);
INSERT INTO dbo.WaitStatsBaseline (WaitType, WaitTimeMs, WaitingTasks)
SELECT wait_type, wait_time_ms, waiting_tasks_count
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN ('SLEEP_TASK', 'LAZYWRITER_SLEEP', 'WAITFOR');
-- So sánh 2 snapshots để tính delta
SELECT
curr.WaitType,
curr.WaitTimeMs - prev.WaitTimeMs AS delta_wait_ms,
curr.WaitingTasks - prev.WaitingTasks AS delta_tasks
FROM dbo.WaitStatsBaseline curr
JOIN dbo.WaitStatsBaseline prev ON curr.WaitType = prev.WaitType
WHERE curr.CaptureTime = '2026-04-01 12:00:00'
AND prev.CaptureTime = '2026-04-01 11:00:00'
ORDER BY delta_wait_ms DESC;
Q&A - Phỏng vấn Monitoring & Diagnostics
Junior Level
Q1: DMV là gì? Tại sao quan trọng?
Dynamic Management Views (DMVs) là các views hệ thống trong SQL Server cung cấp thông tin runtime về server health, query performance, waits, locks. Quan trọng vì:
- Real-time diagnostic mà không cần external tools
- Phát hiện blocking, long-running queries, missing indexes
- Không cần đặc quyền cao (một số DMV chỉ cần VIEW SERVER STATE)
- Dữ liệu accumulate từ khi SQL Server restart (trừ một số reset khi plan cache clear)
Q2: Làm thế nào để tìm query đang blocking trong SQL Server?
SELECT r.session_id, r.blocking_session_id, t.text
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.blocking_session_id > 0;
Sau đó dùng KILL session_id để giải phóng nếu thực sự cần.
Q3: Sự khác biệt giữa Extended Events và SQL Server Profiler?
| Extended Events | SQL Server Profiler | |
|---|---|---|
| Performance overhead | Rất thấp | Cao (không dùng trên production) |
| Granularity | Rất chi tiết | Hạn chế hơn |
| Storage targets | Ring buffer, file, ETW | File, table |
| Real-time | Có | Có |
| Status | Hiện tại (recommended) | Legacy (deprecated) |
Mid Level
Q4: Giải thích wait statistics. Làm thế nào dùng để chẩn đoán bottleneck?
Khi SQL Server cần tài nguyên nhưng không available (CPU, I/O, lock, memory), task sẽ wait và ghi vào wait stats. Analyze top wait types để biết bottleneck:
- PAGEIOLATCH_*: data pages đang đọc từ disk → thêm RAM hoặc index
- LCK_M_*: blocking → tìm blocker, tối ưu transactions
- CXPACKET: parallel queries đang sync → điều chỉnh MAXDOP
- RESOURCE_SEMAPHORE: sort/hash cần memory grant → tối ưu queries, thêm RAM
Q5: Query Store khác gì với sys.dm_exec_query_stats?
| Query Store | sys.dm_exec_query_stats | |
|---|---|---|
| Persistence | Persisted trong database | Lost khi plan evicted from cache |
| History | Configurable (days/weeks) | Chỉ plans còn trong cache |
| Plan comparison | Có thể compare nhiều plans per query | Một plan per entry |
| Force plan | Có | Không |
| Overhead | Nhỏ | None (passive) |
Q6: Page Life Expectancy (PLE) là gì? Giá trị bao nhiêu là tốt?
PLE đo số giây trung bình một data page ở trong buffer pool trước khi bị evict. PLE thấp = pages thường xuyên phải đọc lại từ disk → I/O pressure.
- Traditional threshold: > 300 giây
- Modern guidance: baseline thường > 1000 giây; cảnh báo khi drop > 50% so với baseline
- Với nhiều NUMA nodes: PLE per NUMA node quan trọng hơn global PLE
Senior Level
Q7: Xây dựng monitoring strategy cho SQL Server production. Những gì cần monitor và alert?
Real-time alerts (immediate response):
- Deadlocks/sec > 0 → XE capture + notify DBA
- Blocking duration > 5 phút → auto-kill hoặc notify
- CPU > 90% trong 5+ phút
- Error log: severity 17+ errors
Trending metrics (collect baseline, alert on deviation):
- PLE trend (daily average, alert if < 300s)
- Wait stats delta (hourly snapshot comparison)
- I/O latency per file (> 20ms read = investigate)
- Disk free space (< 20% = warning, < 10% = critical)
- Log space usage (per database)
- TempDB space usage
Capacity planning (weekly/monthly):
- Database growth rate
- Index fragmentation trends
- Missing index opportunities
- Top growing tables
Tools: SQL Server Central Monitoring Repository (custom), Ola Hallengren + alerts, third-party (SolarWinds DPA, SentryOne/SQL Sentry, Redgate SQL Monitor)
Q8: Làm thế nào để diagnose một query đột ngột chậm hơn mà không reproduce được?
- Query Store (nếu đang bật): tìm query_id, so sánh plans, elapsed time trend theo thời gian. Nếu plan regression → force old plan
- sys.dm_exec_query_stats: xem plan_generation_num (số lần recompile), execution count, avg duration
- Extended Events: capture query_post_execution_showplan để xem actual plan
- Wait stats per query: dùng
SET STATISTICS IO, TIME ONđể reproduce - Parameter sniffing: thử
OPTION (RECOMPILE)hoặcOPTION (OPTIMIZE FOR UNKNOWN) - Statistics: kiểm tra last_updated của statistics trên các tables liên quan — outdated stats → bad estimates → bad plan
- Schema changes: ai đó drop/create index, thay đổi table structure
Q9: Explain tại sao CXPACKET wait cao không nhất thiết là vấn đề. Phân biệt CXPACKET “healthy” và “unhealthy”?
CXPACKET là wait type khi các threads trong parallel query plan đang sync (exchange operators). SQL Server phải balance work giữa các threads.
“Healthy” CXPACKET:
- Parallel query đang chạy nhanh, threads sync là bình thường
- CXPACKET is high nhưng query duration ngắn
“Unhealthy” CXPACKET:
- Thread skew: một thread xử lý 90% work, các thread khác chờ → CXPACKET cao + query chậm
- Thường kèm
SOS_SCHEDULER_YIELD(CPU pressure) - Nguyên nhân: data skew, bad statistics, non-sargable predicates
Giải pháp:
- Điều chỉnh
Cost Threshold for Parallelism(mặc định 5, thường tăng lên 50-75) MAXDOPper query:OPTION (MAXDOP 4)để limit parallelism- Fix statistics và data skew
- SQL Server 2019+: Intelligent Query Processing tự adjust parallelism
Từ SQL Server 2016 SP2+: CXCONSUMER (idle threads waiting) thay thế cho phần idle của CXPACKET, giúp phân tích chính xác hơn.