Bootstrap

Oracle数据库巡检

数据库巡检列表

序号业务系统
1主机名
2操作系统
4单机/RAC
4IP地址
5地址类型
6数据类型
7数据库版本
8实例名

巡检方案

检查方面具体检查内容检查标准
集群配置集群软件版本集群软件版本要等于或高于DB软件版本
集群服务状态各种服务状态(除GSD外)需是ONLINE
注:使用asf for rac的环境下ASM资源无需ONLINE
OCR/Votedisk检查OCR及Votedisk状态正常
数据库配置数据库版本建议使用未END SERVICE的版本
数据库参数满足当前业务性能及可用性需求
运行日志和跟踪文件无异常错误(重点关注600,7445错误)
控制文件检查状态是否正常
Redo log文件检查状态是否正常
数据文件数据文件在使用裸设备时,不开启自动扩展
无效对象检查数据库中是否有无效对象。
表空间表空间使用本地管理,同时使用率不高于90%
Resource Limit分析检查 processes和 sessions是否到达过最大限制。
数据库简单性能评估高峰期等待事件同一时间、同一用户、同一个操作发的等待不得超过20个
数据库IO响应时间建议数据文件的读写响应时间不得超过10毫秒

操作系统巡检

检查主机名

hostname

检查linux服务器的操作系统的版本

cat /etc/redhat-release

查看磁盘空间使用情况

df -hT
df -ih

查看空闲内存

free -h

查看内核/操作系统/CPU信息

uname -a 

查看环境变量

env

查看系统运行时间、用户数、负载

uptime -p

查看所有进程

ps -ef | grep oracle

实时显示进程状态

top  

查看所有用户的定时任务

crontab -l

监控系统设备的IO负载情况

iostat -xm 1 10

Oracle集群巡检

集群配置

序号集群类型(standalone/rac)集群版本PSU补丁版本集群节点数
1
2

RAC集群的巡检

检查集群服务状态

检查CRS
su - grid
$ORACLE_HOME/bin/crsctl status resource -t
$ORACLE_HOME/bin/crsctl check crs
检查votedisk
$ORACLE_HOME/bin/crsctl query css votedisk
检查nodeapps
$ORACLE_HOME/bin/srvctl status nodeapps
检查ASM
$ORACLE_HOME/bin/srvctl status asm
检查OCR
$ORACLE_HOME/bin/ocrcheck
检查res
$ORACLE_HOME/bin/crsctl stat res -t

查看视图$asm_diskgroup里面的的日志状态信息

select name, total_mb, free_mb, USABLE_FILE_MB, TYPE from gv$asm_diskgroup;

集群日志分析

#CRS日志
$GRID_HOME/log/HOSTNAME/crsd/crsd.log
#CSS日志
$GRID_HOME/log/HOSTNAME/cssd/ocssd.log
#集群中ALERT文件
$GRID_HOME/log/HOSTNAME/alert(HOSTNAME).log

Oracle数据库的巡检

收集数据库信息

数据库基本信息

序号全局数据库名数据库实例名数据库角色
1
2

数据库的补丁信息

序号数据库版本PSU补丁
1
2

数据库巡检内容

数据库对象大小

-- 1、实例总大小
SELECT SUM(bytes)/1024/1024 AS "Total Size (MB)" FROM dba_segments;
-- 2、表空间大小
-- 查询方法1
SELECT 
  d.tablespace_name "Name", 
  TO_CHAR(NVL(a.BYTES / 1024 / 1024, 0),'99,999,990.99') "Size (M)", 
  TO_CHAR(NVL(a.BYTES - NVL(f.BYTES, 0),0) / 1024 / 1024,'999999990.999') "USE (M)", 
  TO_CHAR(NVL((a.BYTES - NVL(f.BYTES, 0)) / a.BYTES * 100,0),'990.00') || '%' "USAGE RATE %", 
  TO_CHAR(NVL(f.BYTES / 1024 / 1024, 0),'99,999,990.99') "free (M)", 
  TO_CHAR(NVL(f.BYTES / a.BYTES * 100, 0),'99,999,990.99') || '%' "free %" 
FROM 
  SYS.dba_tablespaces d, 
  (SELECT tablespace_name,SUM(BYTES) BYTES FROM dba_data_files GROUP BY tablespace_name) a, 
  (SELECT tablespace_name,SUM(BYTES) BYTES FROM dba_free_space GROUP BY tablespace_name) f 
WHERE 
  d.tablespace_name = a.tablespace_name(+) 
  AND d.tablespace_name = f.tablespace_name(+) 
  AND NOT (d.extent_management LIKE 'LOCAL' AND d.CONTENTS LIKE 'TEMPORARY') 
UNION ALL 
SELECT 
  d.tablespace_name "Name", 
  TO_CHAR(NVL(a.BYTES / 1024 / 1024, 0),'99,999,990.99') "USE (M)", 
  TO_CHAR(NVL(t.BYTES, 0) / 1024 / 1024,'999999990.99') USE,
  TO_CHAR(NVL(t.BYTES / a.BYTES * 100, 0),'990.99') || '%' "USAGE RATE %", 
  TO_CHAR(NVL((a.BYTES - t.BYTES) / 1024 / 1024,0),'99,999,990.99') "free (M)", 
  TO_CHAR(NVL(w.BYTES / a.BYTES * 100, 0),'99,999,990.99') || '%' "free %" 
FROM 
  SYS.dba_tablespaces d, 
  (SELECT tablespace_name,SUM(BYTES) BYTES FROM dba_temp_files GROUP BY tablespace_name) a, 
  (SELECT tablespace_name,(sum(tablespace_size) - sum(free_space)) BYTES
  FROM DBA_TEMP_FREE_SPACE GROUP BY tablespace_name) t, 
  (SELECT tablespace_name,sum(free_space) BYTES FROM DBA_TEMP_FREE_SPACE GROUP BY tablespace_name) w 
WHERE 
  d.tablespace_name = a.tablespace_name(+) 
  AND d.tablespace_name = t.tablespace_name(+) 
  AND d.tablespace_name = w.tablespace_name(+) 
  AND d.extent_management LIKE 'LOCAL' 
  AND d.CONTENTS LIKE 'TEMPORARY';



-- 查询方法2
SELECT 
  D.TABLESPACE_NAME, 
  SPACE "SUM_SPACE(M)", 
  BLOCKS SUM_BLOCKS, 
  SPACE - NVL(FREE_SPACE, 0) "USED_SPACE(M)", 
  ROUND((1 - NVL(FREE_SPACE, 0) / SPACE) * 100,2) "USED_RATE(%)", 
  FREE_SPACE "FREE_SPACE(M)" 
FROM 
  (
    SELECT 
      TABLESPACE_NAME, 
      ROUND(SUM(BYTES) / (1024 * 1024),2) SPACE, 
      SUM(BLOCKS) BLOCKS 
    FROM DBA_DATA_FILES 
    GROUP BY TABLESPACE_NAME
  ) D, 
  (
    SELECT 
      TABLESPACE_NAME, 
      ROUND(SUM(BYTES) / (1024 * 1024),2) FREE_SPACE 
    FROM DBA_FREE_SPACE 
    GROUP BY TABLESPACE_NAME
  ) F 
WHERE 
  D.TABLESPACE_NAME = F.TABLESPACE_NAME(+) 
UNION ALL    --if have tempfile
SELECT 
  D.TABLESPACE_NAME, 
  SPACE "SUM_SPACE(M)", 
  BLOCKS SUM_BLOCKS, 
  USED_SPACE "USED_SPACE(M)", 
  ROUND(NVL(USED_SPACE, 0) / SPACE * 100,2) "USED_RATE(%)", 
  NVL(FREE_SPACE, 0) "FREE_SPACE(M)" 
FROM 
  (
    SELECT 
      TABLESPACE_NAME, 
      ROUND(SUM(BYTES) / (1024 * 1024),2) SPACE, 
      SUM(BLOCKS) BLOCKS 
    FROM DBA_TEMP_FILES 
    GROUP BY TABLESPACE_NAME
  ) D, 
  (
    SELECT 
      TABLESPACE_NAME, 
      ROUND(SUM(BYTES_USED) / (1024 * 1024),2) USED_SPACE, 
      ROUND(SUM(BYTES_FREE) / (1024 * 1024),2) FREE_SPACE 
    FROM V$TEMP_SPACE_HEADER 
    GROUP BY TABLESPACE_NAME
  ) F 
WHERE 
  D.TABLESPACE_NAME = F.TABLESPACE_NAME(+) 
ORDER BY 5 DESC;


-- 3、查询每个schema的大小
SELECT
    owner AS "Schema",
    SUM(bytes) / 1024 / 1024 AS "Total Size (MB)"
FROM
    dba_segments
GROUP BY owner;

-- 4、按照表空间、schema分组查询
select TABLESPACE_NAME,owner,sum(BYTES)/1024/1024 as "Total Size (MB)"
from dba_segments
group by TABLESPACE_NAME,owner
order by 1,2;

-- 5、查询指定schema下top大对象
select OWNER,SEGMENT_NAME,SEGMENT_TYPE,BYTES/1024/1024 as "Total Size (MB)" from dba_segments
where OWNER = 'SYS'
order by BYTES desc;

数据库信息

select DB_UNIQUE_NAME,
       INST_ID,
       dbid,
       name,
       OPEN_MODE,
       VERSION_TIME,
       LOG_MODE,
       DATABASE_ROLE,
       PROTECTION_MODE,
       CREATED
  from gv$database;

数据库实例信息

set lin 500
col HOST_NAME for a20;
select HOST_NAME, STARTUP_TIME, STATUS, from gv$instance;

数据库选件信息

select parameter,value from gv$option;

查看日志状态

col member for a100
set linesize 200
select MEMBER from v$logfile;
select group#, sequence#, bytes/(1024 * 1024 * 1024) GB, members, status, THREAD# from v$logfile;

检查数据库连接情况

查看当前会话连接数,是否属于正常范围。
select count(*) from v$session;

查看数据库参数

show parameter spfile;
select *
  from gv$resource_limit
 where resource_name in ('processes', 'sessions');

可以查看那些表的统计信息是否过期

set linesize 150
set pagesize 1000
SELECT OWNER, TABLE_NAME, PARTITION_NAME, 
       OBJECT_TYPE, STALE_STATS, LAST_ANALYZED 
  FROM DBA_TAB_STATISTICS
 WHERE (STALE_STATS = 'YES' OR LAST_ANALYZED IS NULL)
   LAST_ANALYZED IS NULL
   AND OWNER NOT IN ('MDDATA', 'MDSYS', 'ORDSYS', 'CTXSYS', 
                     'ANONYMOUS', 'EXFSYS', 'OUTLN', 'DIP', 
                     'DMSYS', 'WMSYS', 'XDB', 'ORACLE_OCM', 
                     'TSMSYS', 'ORDPLUGINS', 'SI_INFORMTN_SCHEMA',
                     'OLAPSYS', 'SYSTEM', 'SYS', 'SYSMAN',
                     'DBSNMP', 'SCOTT', 'PERFSTAT', 'PUBLIC',
                     'MGMT_VIEW', 'WK_TEST', 'WKPROXY', 'WKSYS')
   AND TABLE_NAME NOT LIKE 'BIN%'
  order by 1,2;

查看数据库是否有自动统计信息收集

select window_name,autotask_status,optimizer_stats from dba_autotask_window_clients;
select client_name,status from Dba_Autotask_Client where client_name='auto optimizer stats collection';
select window_name,repeat_interval,duration,enabled from dba_scheduler_windows where ENABLED='TRUE' AND window_name not like 'WEEK%'; 

dba_tables 

检查数据文件的状态记录状态不是"online"的数据文件

set lin 200;
SELECT file_name FROM dba_data_files WHERE status='OFFLINE';
set pagesize  1000
col name format a58;
PROMPT
PROMPT database's datafile and tempfile
SELECT FILE#,NAME,STATUS,ENABLED,BYTES/1024/1024 MB,BLOCK_SIZE  FROM v$datafile  
UNION ALL
SELECT FILE#,NAME,STATUS,ENABLED,BYTES/1024/1024 MB,BLOCK_SIZE  FROM v$tempfile ;

查看数据库文件是否存放在共享存储里面的

show parameter db_create_file_dest
select tablespace_name,file_name from dba_data_files;
备注:RAC在每个节点上面看

查看回滚段空间配置

set linesize 140
col segment for a25;
col tablespace_name for a20; 
set pagesize 35;
col ds.bytes/1024/1024 heading 'Bytes(M)' for 9999
col status for a10
select 
   rb.segment_name "Segment",
   rb.tablespace_name,
   rs.optsize,
   rs.status,
   round(100*(1-waits/gets),2) "Ratio",
   round(ds.bytes/1024/1024) "size (M)"
from dba_rollback_segs rb,
   v$rollstat rs,
   dba_segments ds 
where 
   rb.segment_id=rs.usn 
 and rb.segment_name=ds.segment_name
/

检查Oracle控制文件状态

select status,name from v$controlfile;

检查Oracle在线日志状态

col MEMBER format a50;
set lin 100
select group#,status,type,member from v$logfile;

检查Oracle表空间的状态

select tablespace_name,status from dba_tablespaces;

检查Oracle所有数据文件状态

select name,status from v$datafile;
select file_name,status from dba_data_files;

检查无效对象

语句1
select owner,object_name,object_type from dba_objects where status!='VALID' and owner!='SYS' and owner!='SYSTEM';
语句2
select owner,object_name,object_type,status
from dba_objects
where status !='VALID'
and owner not in ('SYS','SYSTEM');

查看表空间的使用情况

表空间信息
语句:
SELECT d.tablespace_name "Name",
       TO_CHAR(NVL(a.BYTES / 1024 / 1024, 0), '99,999,990.99') "Size (M)",
       TO_CHAR(NVL(a.BYTES - NVL(f.BYTES, 0), 0) / 1024 / 1024,
               '999999990.999') "USE  (M)",
       TO_CHAR(NVL((a.BYTES - NVL(f.BYTES, 0)) / a.BYTES * 100, 0),
               '990.00') || '%' "USAGE RATE  %",
       TO_CHAR(NVL(f.BYTES / 1024 / 1024, 0), '99,999,990.99') "free (M)",
       TO_CHAR(NVL(f.BYTES / a.BYTES * 100, 0), '99,999,990.99') || '%' "free %"
  FROM SYS.dba_tablespaces d,
       (SELECT tablespace_name, SUM(BYTES) BYTES
          FROM dba_data_files
         GROUP BY tablespace_name) a,
       (SELECT tablespace_name, SUM(BYTES) BYTES
          FROM dba_free_space
         GROUP BY tablespace_name) f
 WHERE d.tablespace_name = a.tablespace_name(+)
   AND d.tablespace_name = f.tablespace_name(+)
   AND NOT
        (d.extent_management LIKE 'LOCAL' AND d.CONTENTS LIKE 'TEMPORARY')
UNION ALL
SELECT d.tablespace_name "Name",
       TO_CHAR(NVL(a.BYTES / 1024 / 1024, 0), '99,999,990.99') "USE  (M)",
       TO_CHAR(NVL(t.BYTES, 0) / 1024 / 1024, '999999990.99') USE,
       TO_CHAR(NVL(t.BYTES / a.BYTES * 100, 0), '990.99') || '%' "USAGE RATE %",
       TO_CHAR(NVL((a.BYTES - t.BYTES) / 1024 / 1024, 0), '99,999,990.99') "free (M)",
       TO_CHAR(NVL(w.BYTES / a.BYTES * 100, 0), '99,999,990.99') || '%' "free %"
  FROM SYS.dba_tablespaces d,
       (SELECT tablespace_name, SUM(BYTES) BYTES
          FROM dba_temp_files
         GROUP BY tablespace_name) a,
       (SELECT tablespace_name,
               (sum(tablespace_size) - sum(free_space)) BYTES
          FROM DBA_TEMP_FREE_SPACE
         GROUP BY tablespace_name) t,
       (SELECT tablespace_name, sum(free_space) BYTES
          FROM DBA_TEMP_FREE_SPACE
         GROUP BY tablespace_name) w
 WHERE d.tablespace_name = a.tablespace_name(+)
   AND d.tablespace_name = t.tablespace_name(+)
   AND d.tablespace_name = w.tablespace_name(+)
   AND d.extent_management LIKE 'LOCAL'
   AND d.CONTENTS LIKE 'TEMPORARY';

检查表空间每日增长量

SELECT a.snap_id,
       c.tablespace_name ts_name,
       to_char(to_date(a.rtime, 'mm/dd/yyyy hh24:mi:ss'), 'yyyy-mm-dd hh24:mi') rtime,
       round(a.tablespace_size * c.block_size / 1024 / 1024, 2) ts_size_mb,
       round(a.tablespace_usedsize * c.block_size / 1024 / 1024, 2) ts_used_mb,
       round((a.tablespace_size - a.tablespace_usedsize) * c.block_size / 1024 / 1024,
             2) ts_free_mb,
       round(a.tablespace_usedsize / a.tablespace_size * 100, 2) pct_used
  FROM dba_hist_tbspc_space_usage a, 
       (SELECT tablespace_id,
               substr(rtime, 1, 10) rtime,
               max(snap_id) snap_id
          FROM dba_hist_tbspc_space_usage nb
         group by tablespace_id, substr(rtime, 1, 10)) b,
         dba_tablespaces c,
         v$tablespace d
 where a.snap_id = b.snap_id
   and a.tablespace_id = b.tablespace_id
   and a.tablespace_id=d.TS#
   and d.NAME=c.tablespace_name  
   and  to_date(a.rtime, 'mm/dd/yyyy hh24:mi:ss') >=sysdate-30
   order by a.tablespace_id,to_date(a.rtime, 'mm/dd/yyyy hh24:mi:ss') desc;

磁盘组信息

SELECT 'asm disk used:' FROM dual;
set heading  ON;
select group_number gno,name,state,type,total_mb,free_mb,required_mirror_free_mb rmfmb,usable_file_mb ufmb from v$asm_diskgroup;

asm磁盘组使用率

 set line 400
 col name for a12
 col per for a15
 select group_number,
        name,
        total_mb / 1024 total_g,
        round((total_mb - free_mb) / 1024, 2) used_g,
        round(free_mb / 1024, 2) free_g, 
        round(usable_file_mb / 1024, 2) usable_g, 
        round((total_mb - usable_file_mb) / total_mb * 100, 2) || '%' per 
   from v$asm_diskgroup;

归档空间使用率

set line 400
col name for a10
col per for a10
select name,
       space_limit / 1024 / 1024 / 1024 total,
       round(space_used / 1024 / 1024 / 1024, 2) used,
       round((space_limit - space_used) / 1024 / 1024 / 1024, 2) free,
       round(space_used / space_limit * 100, 2) || '%' per
  from v$recovery_file_dest;

数据库实际数据量大小估算

select nvl(t.owner,'total') "user_name",
       to_char(sum(bytes)/1024/1024,'999,999,999,999') "used (M)"
  from dba_segments  t
 group by rollup(t.owner)
 order by 2;

统计dba_recyclebin和dba_objects里面的对象信息

select count( * ) from dba_recyclebin;
select * from dba_role_privs where granted_role = 'DBA';
set linesize 200

查看死锁语句

select username, lockwait, status, machine, program
from v$session
where sid in (select session_id from v$locked_object);

备份检查

select command_id,
       input_type,
       to_char(start_time, 'yyyy-mm-dd hh24:mi:ss') start_time,
       to_char(end_time, 'yyyy-mm-dd hh24:mi:ss') end_time,
       input_bytes_display input,
       output_bytes_display output,
       time_taken_display elapsed_time,
       status
  from v$rman_backup_job_details
 where substrc(command_id, 0, 10) >= to_char(sysdate - 1, 'yyyy-mm-dd')
 order by 1 desc;

数据库日志分析

#alter日志提取
查看视图: select * from v$diag_info; 

#单机日志存放位置
$ORACLE_BASE/diag/rdbms/数据库名/数据库实例名/trace

#RAC数据库日志的存放位置
$ORACLE_BASE/diag/rdbms/数据库名/数据库实例名1/trace
$ORACLE_BASE/diag/rdbms/数据库名/数据库实例名2/trace

性能分析

检查数据库的等待事件

set pages 80
set lines 120
col event for a40
select sid,event,p1,p2,p3,WAIT_TIME,SECONDS_IN_WAIT from v$session_wait where event not like 'SQL%' and event not like 'rdbms%';

如果数据库长时间持续出现大量像latch free,enqueue,buffer busy waits,db file sequential read,db file scattered read等等待事件时,需要对其进行分析,可能存在问题的语句。

Disk Read最高的SQL语句的获取

SELECT SQL_TEXT FROM (SELECT * FROM V$SQLAREA ORDER BY DISK_READS) WHERE ROWNUM<=5;

查找前10条性能差的sql

SELECT * FROM (SELECT PARSING_USER_ID
EXECUTIONS,SORTS,COMMAND_TYPE,DISK_READS,
SQL_TEXT FROM V$SQLAREA ORDER BY DISK_READS DESC)
WHERE ROWNUM<10 ;

等待时间最多的5个系统等待事件的获取

SELECT * FROM (SELECT * FROM V$SYSTEM_EVENT WHERE EVENT NOT LIKE 'SQL%' ORDER BY TOTAL_WAITS DESC) WHERE ROWNUM<=5;

检查运行很久的SQL

COLUMN USERNAME FORMAT A12
COLUMN OPNAME FORMAT A16
COLUMN PROGRESS FORMAT A8
SELECT USERNAME,SID,OPNAME,ROUND(SOFAR*100 / TOTALWORK,0) || '%' AS PROGRESS,TIME_REMAINING,SQL_TEXT FROM V$SESSION_LONGOPS , V$SQL WHERE TIME_REMAINING <> 0 AND SQL_ADDRESS=ADDRESS AND SQL_HASH_VALUE = HASH_VALUE;

检查消耗CPU最高的进程

SET LINE 240
SET VERIFY OFF
COLUMN SID FORMAT 999
COLUMN PID FORMAT 999
COLUMN S_# FORMAT 999
COLUMN USERNAME FORMAT A9 HEADING "ORA USER"
COLUMN PROGRAM FORMAT A29
COLUMN SQL     FORMAT A60
COLUMN OSNAME FORMAT A9 HEADING "OS USER"
SELECT P.PID PID,
       S.SID SID,
       P.SPID SPID,
       S.USERNAME USERNAME,
       S.OSUSER OSNAME,
       P.SERIAL# S_#,
       P.TERMINAL,
       P.PROGRAM PROGRAM,
       P.BACKGROUND,
       S.STATUS,
       RTRIM(SUBSTR(A.SQL_TEXT, 1, 80)) SQLFROM V$PROCESS P,
       V$SESSION S,
       V$SQLAREA A WHERE P.ADDR = S.PADDR AND S.SQL_ADDRESS = A.ADDRESS(+) AND P.SPID LIKE '%&1%';

检查碎片程度高的表

SELECT segment_name table_name, COUNT(*) extents
  FROM dba_segments
 WHERE owner NOT IN ('SYS', 'SYSTEM')
 GROUP BY segment_name
HAVING COUNT(*) = (SELECT MAX(COUNT(*))
                     FROM dba_segments
                    GROUP BY segment_name);

检查表空间的I/O比例

SELECT DF.TABLESPACE_NAME NAME,
       DF.FILE_NAME       "FILE",
       F.PHYRDS           PYR,
       F.PHYBLKRD         PBR,
       F.PHYWRTS          PYW,
       F.PHYBLKWRT        PBW
  FROM V$FILESTAT F, DBA_DATA_FILES DF
 WHERE F.FILE# = DF.FILE_ID
 ORDER BY DF.TABLESPACE_NAME;

检查文件系统的I/O比例

SELECT SUBSTR(A.FILE#, 1, 2) "#",
       SUBSTR(A.NAME, 1, 30) "NAME",
       A.STATUS,
       A.BYTES,
       B.PHYRDS,
       B.PHYWRTS
  FROM V$DATAFILE A, V$FILESTAT B
 WHERE A.FILE# = B.FILE#;

检查数据库cpu、I/O、内存性能

记录数据库的cpu使用、IO、内存等使用情况,使用vmstat,iostat,sar,top等命令进行信息收集并检查这些信息,判断资源使用情况。

# top
top - 10:29:35 up 73 days, 19:54, 1 user, load average: 0.37, 0.38, 0.29
Tasks: 353 total,  2 running, 351 sleeping,  0 stopped,  0 zombie
Cpu(s): 1.2% us, 0.1% sy, 0.0% ni,98.8% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 16404472k total, 12887428k used, 3517044k free,   60796k buffers
Swap: 8385920k total,  665576k used, 7720344k free, 10358384k cached
 
 PID USER     PR NI VIRT RES SHR S %CPU %MEM   TIME+ COMMAND
30495 oracle   15  0 8329m 866m 861m R  10 5.4  7:53.90 oracle            
32501 oracle   15  0 8328m 1.7g 1.7g S   2 10.6  1:58.38 oracle            
32503 oracle   15  0 8329m 1.6g 1.6g S   2 10.2  2:06.62 oracle            

注意上面的id值,此部分内容表示系统剩余的cpu,当其平均值下降至10%以下的时视为CPU使用率异常,需记录下该数值,并将状态记为异常。

内存使用情况

# free -m
            total      used      free    shared   buffers    cached
Mem:       2026      1958       67         0        76      1556
-/+ buffers/cache:       326      1700
Swap:        5992        92      5900

如上所示,total(2026)表示系统总内存,used(1958)表示系统使用的内存,free(67)表示系统剩余内存,当剩余内存低于总内存的10%时视为异常。

系统I/O情况

# iostat -k 1 3      kb显示 间隔1秒,显示3条记录
Linux 2.6.9-22.ELsmp (AS14)    07/29/2009
avg-cpu: %user  %nice   %sys%iowait  %idle
          0.16   0.00   0.05   0.36  99.43
Device:           tps   kB_read/s   kB_wrtn/s   kB_read   kB_wrtn
sda              3.33       13.16       50.25  94483478 360665804
 
avg-cpu: %user  %nice   %sys%iowait  %idle
          0.00   0.00   0.00   0.00 100.00
Device:           tps   kB_read/s   kB_wrtn/s   kB_read   kB_wrtn
sda              0.00        0.00        0.00         0         0

cpu属性值说明:

  • %user:CPU处在用户模式下的时间百分比。

  • %nice:CPU处在带NICE值的用户模式下的时间百分比。

  • %system:CPU处在系统模式下的时间百分比。

  • %iowait:CPU等待输入输出完成时间的百分比。

  • %steal:管理程序维护另一个虚拟处理器时,虚拟CPU的无意识等待时间百分比。

  • %idle:CPU空闲时间百分比。

备注:如果%iowait的值过高,表示硬盘存在I/O瓶颈,%idle值高,表示CPU较空闲,如果%idle值高但系统响应慢时,有可能是CPU等待分配内存,此时应加大内存容量。%idle值如果持续低于10,那么系统的CPU处理能力相对较低,表明系统中最需要解决的资源是CPU。

Disk属性说明:

  • device:磁盘名称
  • tps:每秒钟发送到的I/O请求数.
  • Blk_read/s:每秒读取的block数.
  • Blk_wrtn/s:每秒写入的block数.
  • Blk_read:读入的block总数.
  • Blk_wrtn:写入的block总数.

系统负载情况

#uptime
12:08:37 up 162 days, 23:33, 15 users, load average: 0.01, 0.15, 0.10

如上所示,load average表示系统负载,后面的3个数值如果有高于2.5的时候就表明系统在超负荷运转了,并将此值记录到巡检表,视为异常。

查看是否有僵死进程

#有些僵尸进程有阻塞其他业务的正常运行,定期杀掉僵尸进程。
select spid from v$process where addr not in (select paddr from v$session);

检查缓冲区命中率

SELECT a.VALUE + b.VALUE logical_reads,
        c.VALUE phys_reads,
        round(100 * (1 - c.value / (a.value + b.value)), 4) hit_ratio
  FROM v$sysstat a,
       v$sysstat b,
       v$sysstat c  WHERE a.NAME = 'db block gets'  AND b.NAME = 'consistent gets'  AND c.NAME = 'physical reads';

如果命中率低于90%则需加大数据库参数db_cache_size。

检查共享池命中率

select sum(pinhits)/sum(pins)*100 from v$librarycache;

如低于95%,则需要调整应用程序使用绑定变量,或者调整数据库参数shared pool的大小。

检查排序区

select name,value from v$sysstat where name like '%sort%';

如果disk/(memoty+row)的比例过高,则需要调整sort_area_size(workarea_size_policy=false)或pga_aggregate_target(workarea_size_policy=true)。

检查日志缓冲区

select name,value from v$sysstat where name in ('redo entries','redo buffer allocation retries');

如果redo buffer allocation retries/redo entries超过1%,则需要增大log_buffer。

查找占用内存读最多的SQL

SELECT t.ADDRESS,
       t.SQL_TEXT,
	   RANK() OVER(ORDER BY t.buffer_gets DESC) AS rank_buffgets,
	   to_char(100*ratio_to_report(t.buffer_gets) OVER(),'99.99') AS pct_buffergets
FROM v$sql t;
;