数据库巡检列表
序号 | 业务系统 | |
---|---|---|
1 | 主机名 | |
2 | 操作系统 | |
4 | 单机/RAC | |
4 | IP地址 | |
5 | 地址类型 | |
6 | 数据类型 | |
7 | 数据库版本 | |
8 | 实例名 |
巡检方案
检查方面 | 具体检查内容 | 检查标准 |
---|---|---|
集群配置 | 集群软件版本 | 集群软件版本要等于或高于DB软件版本 |
集群服务状态 | 各种服务状态(除GSD外)需是ONLINE 注:使用asf for rac的环境下ASM资源无需ONLINE | |
OCR/Votedisk检查 | OCR及Votedisk状态正常 | |
数据库配置 | 数据库版本 | 建议使用未END SERVICE的版本 |
数据库参数 | 满足当前业务性能及可用性需求 | |
运行日志和跟踪文件 | 无异常错误(重点关注600,7445错误) | |
控制文件 | 检查状态是否正常 | |
Redo log文件 | 检查状态是否正常 | |
数据文件 | 数据文件在使用裸设备时,不开启自动扩展 | |
无效对象 | 检查数据库中是否有无效对象。 | |
表空间 | 表空间使用本地管理,同时使用率不高于90% | |
Resource Limit分析 | 检查 processes和 sessions是否到达过最大限制。 | |
数据库简单性能评估 | 高峰期等待事件 | 同一时间、同一用户、同一个操作发的等待不得超过20个 |
数据库IO响应时间 | 建议数据文件的读写响应时间不得超过10毫秒 |
操作系统巡检
检查主机名
hostname
检查linux服务器的操作系统的版本
cat /etc/redhat-release
查看磁盘空间使用情况
df -hT
df -ih
查看空闲内存
free -h
查看内核/操作系统/CPU信息
uname -a
查看环境变量
env
查看系统运行时间、用户数、负载
uptime -p
查看所有进程
ps -ef | grep oracle
实时显示进程状态
top
查看所有用户的定时任务
crontab -l
监控系统设备的IO负载情况
iostat -xm 1 10
Oracle集群巡检
集群配置
序号 | 集群类型(standalone/rac) | 集群版本 | PSU补丁版本 | 集群节点数 |
---|---|---|---|---|
1 | ||||
2 |
RAC集群的巡检
检查集群服务状态
检查CRS
su - grid
$ORACLE_HOME/bin/crsctl status resource -t
$ORACLE_HOME/bin/crsctl check crs
检查votedisk
$ORACLE_HOME/bin/crsctl query css votedisk
检查nodeapps
$ORACLE_HOME/bin/srvctl status nodeapps
检查ASM
$ORACLE_HOME/bin/srvctl status asm
检查OCR
$ORACLE_HOME/bin/ocrcheck
检查res
$ORACLE_HOME/bin/crsctl stat res -t
查看视图$asm_diskgroup里面的的日志状态信息
select name, total_mb, free_mb, USABLE_FILE_MB, TYPE from gv$asm_diskgroup;
集群日志分析
#CRS日志
$GRID_HOME/log/HOSTNAME/crsd/crsd.log
#CSS日志
$GRID_HOME/log/HOSTNAME/cssd/ocssd.log
#集群中ALERT文件
$GRID_HOME/log/HOSTNAME/alert(HOSTNAME).log
Oracle数据库的巡检
收集数据库信息
数据库基本信息
序号 | 全局数据库名 | 数据库实例名 | 数据库角色 |
---|---|---|---|
1 | |||
2 |
数据库的补丁信息
序号 | 数据库版本 | PSU补丁 |
---|---|---|
1 | ||
2 |
数据库巡检内容
数据库对象大小
-- 1、实例总大小
SELECT SUM(bytes)/1024/1024 AS "Total Size (MB)" FROM dba_segments;
-- 2、表空间大小
-- 查询方法1
SELECT
d.tablespace_name "Name",
TO_CHAR(NVL(a.BYTES / 1024 / 1024, 0),'99,999,990.99') "Size (M)",
TO_CHAR(NVL(a.BYTES - NVL(f.BYTES, 0),0) / 1024 / 1024,'999999990.999') "USE (M)",
TO_CHAR(NVL((a.BYTES - NVL(f.BYTES, 0)) / a.BYTES * 100,0),'990.00') || '%' "USAGE RATE %",
TO_CHAR(NVL(f.BYTES / 1024 / 1024, 0),'99,999,990.99') "free (M)",
TO_CHAR(NVL(f.BYTES / a.BYTES * 100, 0),'99,999,990.99') || '%' "free %"
FROM
SYS.dba_tablespaces d,
(SELECT tablespace_name,SUM(BYTES) BYTES FROM dba_data_files GROUP BY tablespace_name) a,
(SELECT tablespace_name,SUM(BYTES) BYTES FROM dba_free_space GROUP BY tablespace_name) f
WHERE
d.tablespace_name = a.tablespace_name(+)
AND d.tablespace_name = f.tablespace_name(+)
AND NOT (d.extent_management LIKE 'LOCAL' AND d.CONTENTS LIKE 'TEMPORARY')
UNION ALL
SELECT
d.tablespace_name "Name",
TO_CHAR(NVL(a.BYTES / 1024 / 1024, 0),'99,999,990.99') "USE (M)",
TO_CHAR(NVL(t.BYTES, 0) / 1024 / 1024,'999999990.99') USE,
TO_CHAR(NVL(t.BYTES / a.BYTES * 100, 0),'990.99') || '%' "USAGE RATE %",
TO_CHAR(NVL((a.BYTES - t.BYTES) / 1024 / 1024,0),'99,999,990.99') "free (M)",
TO_CHAR(NVL(w.BYTES / a.BYTES * 100, 0),'99,999,990.99') || '%' "free %"
FROM
SYS.dba_tablespaces d,
(SELECT tablespace_name,SUM(BYTES) BYTES FROM dba_temp_files GROUP BY tablespace_name) a,
(SELECT tablespace_name,(sum(tablespace_size) - sum(free_space)) BYTES
FROM DBA_TEMP_FREE_SPACE GROUP BY tablespace_name) t,
(SELECT tablespace_name,sum(free_space) BYTES FROM DBA_TEMP_FREE_SPACE GROUP BY tablespace_name) w
WHERE
d.tablespace_name = a.tablespace_name(+)
AND d.tablespace_name = t.tablespace_name(+)
AND d.tablespace_name = w.tablespace_name(+)
AND d.extent_management LIKE 'LOCAL'
AND d.CONTENTS LIKE 'TEMPORARY';
-- 查询方法2
SELECT
D.TABLESPACE_NAME,
SPACE "SUM_SPACE(M)",
BLOCKS SUM_BLOCKS,
SPACE - NVL(FREE_SPACE, 0) "USED_SPACE(M)",
ROUND((1 - NVL(FREE_SPACE, 0) / SPACE) * 100,2) "USED_RATE(%)",
FREE_SPACE "FREE_SPACE(M)"
FROM
(
SELECT
TABLESPACE_NAME,
ROUND(SUM(BYTES) / (1024 * 1024),2) SPACE,
SUM(BLOCKS) BLOCKS
FROM DBA_DATA_FILES
GROUP BY TABLESPACE_NAME
) D,
(
SELECT
TABLESPACE_NAME,
ROUND(SUM(BYTES) / (1024 * 1024),2) FREE_SPACE
FROM DBA_FREE_SPACE
GROUP BY TABLESPACE_NAME
) F
WHERE
D.TABLESPACE_NAME = F.TABLESPACE_NAME(+)
UNION ALL --if have tempfile
SELECT
D.TABLESPACE_NAME,
SPACE "SUM_SPACE(M)",
BLOCKS SUM_BLOCKS,
USED_SPACE "USED_SPACE(M)",
ROUND(NVL(USED_SPACE, 0) / SPACE * 100,2) "USED_RATE(%)",
NVL(FREE_SPACE, 0) "FREE_SPACE(M)"
FROM
(
SELECT
TABLESPACE_NAME,
ROUND(SUM(BYTES) / (1024 * 1024),2) SPACE,
SUM(BLOCKS) BLOCKS
FROM DBA_TEMP_FILES
GROUP BY TABLESPACE_NAME
) D,
(
SELECT
TABLESPACE_NAME,
ROUND(SUM(BYTES_USED) / (1024 * 1024),2) USED_SPACE,
ROUND(SUM(BYTES_FREE) / (1024 * 1024),2) FREE_SPACE
FROM V$TEMP_SPACE_HEADER
GROUP BY TABLESPACE_NAME
) F
WHERE
D.TABLESPACE_NAME = F.TABLESPACE_NAME(+)
ORDER BY 5 DESC;
-- 3、查询每个schema的大小
SELECT
owner AS "Schema",
SUM(bytes) / 1024 / 1024 AS "Total Size (MB)"
FROM
dba_segments
GROUP BY owner;
-- 4、按照表空间、schema分组查询
select TABLESPACE_NAME,owner,sum(BYTES)/1024/1024 as "Total Size (MB)"
from dba_segments
group by TABLESPACE_NAME,owner
order by 1,2;
-- 5、查询指定schema下top大对象
select OWNER,SEGMENT_NAME,SEGMENT_TYPE,BYTES/1024/1024 as "Total Size (MB)" from dba_segments
where OWNER = 'SYS'
order by BYTES desc;
数据库信息
select DB_UNIQUE_NAME,
INST_ID,
dbid,
name,
OPEN_MODE,
VERSION_TIME,
LOG_MODE,
DATABASE_ROLE,
PROTECTION_MODE,
CREATED
from gv$database;
数据库实例信息
set lin 500
col HOST_NAME for a20;
select HOST_NAME, STARTUP_TIME, STATUS, from gv$instance;
数据库选件信息
select parameter,value from gv$option;
查看日志状态
col member for a100
set linesize 200
select MEMBER from v$logfile;
select group#, sequence#, bytes/(1024 * 1024 * 1024) GB, members, status, THREAD# from v$logfile;
检查数据库连接情况
查看当前会话连接数,是否属于正常范围。
select count(*) from v$session;
查看数据库参数
show parameter spfile;
select *
from gv$resource_limit
where resource_name in ('processes', 'sessions');
可以查看那些表的统计信息是否过期
set linesize 150
set pagesize 1000
SELECT OWNER, TABLE_NAME, PARTITION_NAME,
OBJECT_TYPE, STALE_STATS, LAST_ANALYZED
FROM DBA_TAB_STATISTICS
WHERE (STALE_STATS = 'YES' OR LAST_ANALYZED IS NULL)
LAST_ANALYZED IS NULL
AND OWNER NOT IN ('MDDATA', 'MDSYS', 'ORDSYS', 'CTXSYS',
'ANONYMOUS', 'EXFSYS', 'OUTLN', 'DIP',
'DMSYS', 'WMSYS', 'XDB', 'ORACLE_OCM',
'TSMSYS', 'ORDPLUGINS', 'SI_INFORMTN_SCHEMA',
'OLAPSYS', 'SYSTEM', 'SYS', 'SYSMAN',
'DBSNMP', 'SCOTT', 'PERFSTAT', 'PUBLIC',
'MGMT_VIEW', 'WK_TEST', 'WKPROXY', 'WKSYS')
AND TABLE_NAME NOT LIKE 'BIN%'
order by 1,2;
查看数据库是否有自动统计信息收集
select window_name,autotask_status,optimizer_stats from dba_autotask_window_clients;
select client_name,status from Dba_Autotask_Client where client_name='auto optimizer stats collection';
select window_name,repeat_interval,duration,enabled from dba_scheduler_windows where ENABLED='TRUE' AND window_name not like 'WEEK%';
dba_tables
检查数据文件的状态记录状态不是"online"的数据文件
set lin 200;
SELECT file_name FROM dba_data_files WHERE status='OFFLINE';
set pagesize 1000
col name format a58;
PROMPT
PROMPT database's datafile and tempfile
SELECT FILE#,NAME,STATUS,ENABLED,BYTES/1024/1024 MB,BLOCK_SIZE FROM v$datafile
UNION ALL
SELECT FILE#,NAME,STATUS,ENABLED,BYTES/1024/1024 MB,BLOCK_SIZE FROM v$tempfile ;
查看数据库文件是否存放在共享存储里面的
show parameter db_create_file_dest
select tablespace_name,file_name from dba_data_files;
备注:RAC在每个节点上面看
查看回滚段空间配置
set linesize 140
col segment for a25;
col tablespace_name for a20;
set pagesize 35;
col ds.bytes/1024/1024 heading 'Bytes(M)' for 9999
col status for a10
select
rb.segment_name "Segment",
rb.tablespace_name,
rs.optsize,
rs.status,
round(100*(1-waits/gets),2) "Ratio",
round(ds.bytes/1024/1024) "size (M)"
from dba_rollback_segs rb,
v$rollstat rs,
dba_segments ds
where
rb.segment_id=rs.usn
and rb.segment_name=ds.segment_name
/
检查Oracle控制文件状态
select status,name from v$controlfile;
检查Oracle在线日志状态
col MEMBER format a50;
set lin 100
select group#,status,type,member from v$logfile;
检查Oracle表空间的状态
select tablespace_name,status from dba_tablespaces;
检查Oracle所有数据文件状态
select name,status from v$datafile;
select file_name,status from dba_data_files;
检查无效对象
语句1
select owner,object_name,object_type from dba_objects where status!='VALID' and owner!='SYS' and owner!='SYSTEM';
语句2
select owner,object_name,object_type,status
from dba_objects
where status !='VALID'
and owner not in ('SYS','SYSTEM');
查看表空间的使用情况
表空间信息
语句:
SELECT d.tablespace_name "Name",
TO_CHAR(NVL(a.BYTES / 1024 / 1024, 0), '99,999,990.99') "Size (M)",
TO_CHAR(NVL(a.BYTES - NVL(f.BYTES, 0), 0) / 1024 / 1024,
'999999990.999') "USE (M)",
TO_CHAR(NVL((a.BYTES - NVL(f.BYTES, 0)) / a.BYTES * 100, 0),
'990.00') || '%' "USAGE RATE %",
TO_CHAR(NVL(f.BYTES / 1024 / 1024, 0), '99,999,990.99') "free (M)",
TO_CHAR(NVL(f.BYTES / a.BYTES * 100, 0), '99,999,990.99') || '%' "free %"
FROM SYS.dba_tablespaces d,
(SELECT tablespace_name, SUM(BYTES) BYTES
FROM dba_data_files
GROUP BY tablespace_name) a,
(SELECT tablespace_name, SUM(BYTES) BYTES
FROM dba_free_space
GROUP BY tablespace_name) f
WHERE d.tablespace_name = a.tablespace_name(+)
AND d.tablespace_name = f.tablespace_name(+)
AND NOT
(d.extent_management LIKE 'LOCAL' AND d.CONTENTS LIKE 'TEMPORARY')
UNION ALL
SELECT d.tablespace_name "Name",
TO_CHAR(NVL(a.BYTES / 1024 / 1024, 0), '99,999,990.99') "USE (M)",
TO_CHAR(NVL(t.BYTES, 0) / 1024 / 1024, '999999990.99') USE,
TO_CHAR(NVL(t.BYTES / a.BYTES * 100, 0), '990.99') || '%' "USAGE RATE %",
TO_CHAR(NVL((a.BYTES - t.BYTES) / 1024 / 1024, 0), '99,999,990.99') "free (M)",
TO_CHAR(NVL(w.BYTES / a.BYTES * 100, 0), '99,999,990.99') || '%' "free %"
FROM SYS.dba_tablespaces d,
(SELECT tablespace_name, SUM(BYTES) BYTES
FROM dba_temp_files
GROUP BY tablespace_name) a,
(SELECT tablespace_name,
(sum(tablespace_size) - sum(free_space)) BYTES
FROM DBA_TEMP_FREE_SPACE
GROUP BY tablespace_name) t,
(SELECT tablespace_name, sum(free_space) BYTES
FROM DBA_TEMP_FREE_SPACE
GROUP BY tablespace_name) w
WHERE d.tablespace_name = a.tablespace_name(+)
AND d.tablespace_name = t.tablespace_name(+)
AND d.tablespace_name = w.tablespace_name(+)
AND d.extent_management LIKE 'LOCAL'
AND d.CONTENTS LIKE 'TEMPORARY';
检查表空间每日增长量
SELECT a.snap_id,
c.tablespace_name ts_name,
to_char(to_date(a.rtime, 'mm/dd/yyyy hh24:mi:ss'), 'yyyy-mm-dd hh24:mi') rtime,
round(a.tablespace_size * c.block_size / 1024 / 1024, 2) ts_size_mb,
round(a.tablespace_usedsize * c.block_size / 1024 / 1024, 2) ts_used_mb,
round((a.tablespace_size - a.tablespace_usedsize) * c.block_size / 1024 / 1024,
2) ts_free_mb,
round(a.tablespace_usedsize / a.tablespace_size * 100, 2) pct_used
FROM dba_hist_tbspc_space_usage a,
(SELECT tablespace_id,
substr(rtime, 1, 10) rtime,
max(snap_id) snap_id
FROM dba_hist_tbspc_space_usage nb
group by tablespace_id, substr(rtime, 1, 10)) b,
dba_tablespaces c,
v$tablespace d
where a.snap_id = b.snap_id
and a.tablespace_id = b.tablespace_id
and a.tablespace_id=d.TS#
and d.NAME=c.tablespace_name
and to_date(a.rtime, 'mm/dd/yyyy hh24:mi:ss') >=sysdate-30
order by a.tablespace_id,to_date(a.rtime, 'mm/dd/yyyy hh24:mi:ss') desc;
磁盘组信息
SELECT 'asm disk used:' FROM dual;
set heading ON;
select group_number gno,name,state,type,total_mb,free_mb,required_mirror_free_mb rmfmb,usable_file_mb ufmb from v$asm_diskgroup;
asm磁盘组使用率
set line 400
col name for a12
col per for a15
select group_number,
name,
total_mb / 1024 total_g,
round((total_mb - free_mb) / 1024, 2) used_g,
round(free_mb / 1024, 2) free_g,
round(usable_file_mb / 1024, 2) usable_g,
round((total_mb - usable_file_mb) / total_mb * 100, 2) || '%' per
from v$asm_diskgroup;
归档空间使用率
set line 400
col name for a10
col per for a10
select name,
space_limit / 1024 / 1024 / 1024 total,
round(space_used / 1024 / 1024 / 1024, 2) used,
round((space_limit - space_used) / 1024 / 1024 / 1024, 2) free,
round(space_used / space_limit * 100, 2) || '%' per
from v$recovery_file_dest;
数据库实际数据量大小估算
select nvl(t.owner,'total') "user_name",
to_char(sum(bytes)/1024/1024,'999,999,999,999') "used (M)"
from dba_segments t
group by rollup(t.owner)
order by 2;
统计dba_recyclebin和dba_objects里面的对象信息
select count( * ) from dba_recyclebin;
select * from dba_role_privs where granted_role = 'DBA';
set linesize 200
查看死锁语句
select username, lockwait, status, machine, program
from v$session
where sid in (select session_id from v$locked_object);
备份检查
select command_id,
input_type,
to_char(start_time, 'yyyy-mm-dd hh24:mi:ss') start_time,
to_char(end_time, 'yyyy-mm-dd hh24:mi:ss') end_time,
input_bytes_display input,
output_bytes_display output,
time_taken_display elapsed_time,
status
from v$rman_backup_job_details
where substrc(command_id, 0, 10) >= to_char(sysdate - 1, 'yyyy-mm-dd')
order by 1 desc;
数据库日志分析
#alter日志提取
查看视图: select * from v$diag_info;
#单机日志存放位置
$ORACLE_BASE/diag/rdbms/数据库名/数据库实例名/trace
#RAC数据库日志的存放位置
$ORACLE_BASE/diag/rdbms/数据库名/数据库实例名1/trace
$ORACLE_BASE/diag/rdbms/数据库名/数据库实例名2/trace
性能分析
检查数据库的等待事件
set pages 80
set lines 120
col event for a40
select sid,event,p1,p2,p3,WAIT_TIME,SECONDS_IN_WAIT from v$session_wait where event not like 'SQL%' and event not like 'rdbms%';
如果数据库长时间持续出现大量像latch free,enqueue,buffer busy waits,db file sequential read,db file scattered read等等待事件时,需要对其进行分析,可能存在问题的语句。
Disk Read最高的SQL语句的获取
SELECT SQL_TEXT FROM (SELECT * FROM V$SQLAREA ORDER BY DISK_READS) WHERE ROWNUM<=5;
查找前10条性能差的sql
SELECT * FROM (SELECT PARSING_USER_ID
EXECUTIONS,SORTS,COMMAND_TYPE,DISK_READS,
SQL_TEXT FROM V$SQLAREA ORDER BY DISK_READS DESC)
WHERE ROWNUM<10 ;
等待时间最多的5个系统等待事件的获取
SELECT * FROM (SELECT * FROM V$SYSTEM_EVENT WHERE EVENT NOT LIKE 'SQL%' ORDER BY TOTAL_WAITS DESC) WHERE ROWNUM<=5;
检查运行很久的SQL
COLUMN USERNAME FORMAT A12
COLUMN OPNAME FORMAT A16
COLUMN PROGRESS FORMAT A8
SELECT USERNAME,SID,OPNAME,ROUND(SOFAR*100 / TOTALWORK,0) || '%' AS PROGRESS,TIME_REMAINING,SQL_TEXT FROM V$SESSION_LONGOPS , V$SQL WHERE TIME_REMAINING <> 0 AND SQL_ADDRESS=ADDRESS AND SQL_HASH_VALUE = HASH_VALUE;
检查消耗CPU最高的进程
SET LINE 240
SET VERIFY OFF
COLUMN SID FORMAT 999
COLUMN PID FORMAT 999
COLUMN S_# FORMAT 999
COLUMN USERNAME FORMAT A9 HEADING "ORA USER"
COLUMN PROGRAM FORMAT A29
COLUMN SQL FORMAT A60
COLUMN OSNAME FORMAT A9 HEADING "OS USER"
SELECT P.PID PID,
S.SID SID,
P.SPID SPID,
S.USERNAME USERNAME,
S.OSUSER OSNAME,
P.SERIAL# S_#,
P.TERMINAL,
P.PROGRAM PROGRAM,
P.BACKGROUND,
S.STATUS,
RTRIM(SUBSTR(A.SQL_TEXT, 1, 80)) SQLFROM V$PROCESS P,
V$SESSION S,
V$SQLAREA A WHERE P.ADDR = S.PADDR AND S.SQL_ADDRESS = A.ADDRESS(+) AND P.SPID LIKE '%&1%';
检查碎片程度高的表
SELECT segment_name table_name, COUNT(*) extents
FROM dba_segments
WHERE owner NOT IN ('SYS', 'SYSTEM')
GROUP BY segment_name
HAVING COUNT(*) = (SELECT MAX(COUNT(*))
FROM dba_segments
GROUP BY segment_name);
检查表空间的I/O比例
SELECT DF.TABLESPACE_NAME NAME,
DF.FILE_NAME "FILE",
F.PHYRDS PYR,
F.PHYBLKRD PBR,
F.PHYWRTS PYW,
F.PHYBLKWRT PBW
FROM V$FILESTAT F, DBA_DATA_FILES DF
WHERE F.FILE# = DF.FILE_ID
ORDER BY DF.TABLESPACE_NAME;
检查文件系统的I/O比例
SELECT SUBSTR(A.FILE#, 1, 2) "#",
SUBSTR(A.NAME, 1, 30) "NAME",
A.STATUS,
A.BYTES,
B.PHYRDS,
B.PHYWRTS
FROM V$DATAFILE A, V$FILESTAT B
WHERE A.FILE# = B.FILE#;
检查数据库cpu、I/O、内存性能
记录数据库的cpu使用、IO、内存等使用情况,使用vmstat,iostat,sar,top等命令进行信息收集并检查这些信息,判断资源使用情况。
# top
top - 10:29:35 up 73 days, 19:54, 1 user, load average: 0.37, 0.38, 0.29
Tasks: 353 total, 2 running, 351 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.2% us, 0.1% sy, 0.0% ni,98.8% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 16404472k total, 12887428k used, 3517044k free, 60796k buffers
Swap: 8385920k total, 665576k used, 7720344k free, 10358384k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30495 oracle 15 0 8329m 866m 861m R 10 5.4 7:53.90 oracle
32501 oracle 15 0 8328m 1.7g 1.7g S 2 10.6 1:58.38 oracle
32503 oracle 15 0 8329m 1.6g 1.6g S 2 10.2 2:06.62 oracle
注意上面的id值,此部分内容表示系统剩余的cpu,当其平均值下降至10%以下的时视为CPU使用率异常,需记录下该数值,并将状态记为异常。
内存使用情况
# free -m
total used free shared buffers cached
Mem: 2026 1958 67 0 76 1556
-/+ buffers/cache: 326 1700
Swap: 5992 92 5900
如上所示,total(2026)表示系统总内存,used(1958)表示系统使用的内存,free(67)表示系统剩余内存,当剩余内存低于总内存的10%时视为异常。
系统I/O情况
# iostat -k 1 3 kb显示 间隔1秒,显示3条记录
Linux 2.6.9-22.ELsmp (AS14) 07/29/2009
avg-cpu: %user %nice %sys%iowait %idle
0.16 0.00 0.05 0.36 99.43
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 3.33 13.16 50.25 94483478 360665804
avg-cpu: %user %nice %sys%iowait %idle
0.00 0.00 0.00 0.00 100.00
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
cpu属性值说明:
-
%user:CPU处在用户模式下的时间百分比。
-
%nice:CPU处在带NICE值的用户模式下的时间百分比。
-
%system:CPU处在系统模式下的时间百分比。
-
%iowait:CPU等待输入输出完成时间的百分比。
-
%steal:管理程序维护另一个虚拟处理器时,虚拟CPU的无意识等待时间百分比。
-
%idle:CPU空闲时间百分比。
备注:如果%iowait的值过高,表示硬盘存在I/O瓶颈,%idle值高,表示CPU较空闲,如果%idle值高但系统响应慢时,有可能是CPU等待分配内存,此时应加大内存容量。%idle值如果持续低于10,那么系统的CPU处理能力相对较低,表明系统中最需要解决的资源是CPU。
Disk属性说明:
- device:磁盘名称
- tps:每秒钟发送到的I/O请求数.
- Blk_read/s:每秒读取的block数.
- Blk_wrtn/s:每秒写入的block数.
- Blk_read:读入的block总数.
- Blk_wrtn:写入的block总数.
系统负载情况
#uptime
12:08:37 up 162 days, 23:33, 15 users, load average: 0.01, 0.15, 0.10
如上所示,load average表示系统负载,后面的3个数值如果有高于2.5的时候就表明系统在超负荷运转了,并将此值记录到巡检表,视为异常。
查看是否有僵死进程
#有些僵尸进程有阻塞其他业务的正常运行,定期杀掉僵尸进程。
select spid from v$process where addr not in (select paddr from v$session);
检查缓冲区命中率
SELECT a.VALUE + b.VALUE logical_reads,
c.VALUE phys_reads,
round(100 * (1 - c.value / (a.value + b.value)), 4) hit_ratio
FROM v$sysstat a,
v$sysstat b,
v$sysstat c WHERE a.NAME = 'db block gets' AND b.NAME = 'consistent gets' AND c.NAME = 'physical reads';
如果命中率低于90%则需加大数据库参数db_cache_size。
检查共享池命中率
select sum(pinhits)/sum(pins)*100 from v$librarycache;
如低于95%,则需要调整应用程序使用绑定变量,或者调整数据库参数shared pool的大小。
检查排序区
select name,value from v$sysstat where name like '%sort%';
如果disk/(memoty+row)的比例过高,则需要调整sort_area_size(workarea_size_policy=false)或pga_aggregate_target(workarea_size_policy=true)。
检查日志缓冲区
select name,value from v$sysstat where name in ('redo entries','redo buffer allocation retries');
如果redo buffer allocation retries/redo entries超过1%,则需要增大log_buffer。
查找占用内存读最多的SQL
SELECT t.ADDRESS,
t.SQL_TEXT,
RANK() OVER(ORDER BY t.buffer_gets DESC) AS rank_buffgets,
to_char(100*ratio_to_report(t.buffer_gets) OVER(),'99.99') AS pct_buffergets
FROM v$sql t;