理解LGWR,Log File Sync Waits以及Commit的性能问题

一．概要：

1. Commit和log filesync的工作机制

2. 为什么log file wait太久

3. 如何去度量问题出在那里呢？

二．log file sync等待的原因

1. 默认情况下我们commit一个事务是要等待logfile sync,这其中包括：

(1)User commit(用户提交的统计信息可以通过v$sesstat来查看)

(2)DDL-这一部分主要是由于递归的事务提交所产生

(3)递归的数据字典DML操作

2. Rollbacks导致log file sync等待

(1)Userrollbacks-用户或者由应用发出的rollback操作所致

(2)Transactionrollbacks:1,由于一些失败的操作导致oracle内部的rollback 2.空间分配,或者ASSM相关的问题,以及用户取消的长查询,被kill掉的session等等。

下图为Commit和log file sync相关的流程图：

Log file sync performance>disk IO speed

****大多数log file sync的等待时间其实都是花费在logfile parallel write,类似与DBWR会等待db file parallel write

****其它的log file sync等待花费在调度延迟，IPC通信延迟等等

1. 前台进程对LGWR发出调用,然后到sleep状态下面看看Log file sync等待的整个流程：

此时log file sync等待开始记数

次调用在Unix平台是通过信号量来实现

2. LGWR被唤醒,得到CPU时间片来工作

LGWR发出IO请求

LGWR转去sleep,并且等待log file parallel write

3. 当在存储级别完成IO调用后OS唤醒LGWR进程

LGWR继续去获得CPU时间片

此时标记log file parallel write等待完成,Post相关信息给前台进程

4. 前台进程被LGWR唤醒,前台进程得到CPU时间片并且标记log file sync等待完成

通过snapper脚本来度量LGWR的速度：

---------------------------------------------------------------------------------
SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH
---------------------------------------------------------------------------------
1096, (LGWR) , STAT, messages sent , 12 , 12,
1096, (LGWR) , STAT, messages received , 10 , 10,
1096, (LGWR) , STAT, background timeouts , 1 , 1,
1096, (LGWR) , STAT, physical write total IO requests , 40, 40,
1096, (LGWR) , STAT, physical write total multi block request, 38, 38,
1096, (LGWR) , STAT, physical write total bytes, 2884608 , 2.88M,
1096, (LGWR) , STAT, calls to kcmgcs , 20 , 20,
1096, (LGWR) , STAT, redo wastage , 4548 , 4.55k,
1096, (LGWR) , STAT, redo writes , 10 , 10,
1096, (LGWR) , STAT, redo blocks written , 2817 , 2.82k,
1096, (LGWR) , STAT, redo write time , 25 , 25,
1096, (LGWR) , WAIT, LGWR wait on LNS , 1040575 , 1.04s, 104.1%, |@@@@@@@@@@|
1096, (LGWR) , WAIT, log file parallel write , 273837 , 273.84ms, 27.4%,|@@@ |
1096, (LGWR) , WAIT, events in waitclass Other , 1035172 , 1.04s , 103.5%,|@@@@@@@@@@|

LGWR和Asynch IO

oracle@linux01:~$ strace -cp `pgrep -f lgwr`
Process 12457 attached - interrupt to quit
^CProcess 12457 detached
% time seconds     usecs/call  calls     errors    syscall
------ ----------- ----------- --------- --------- --------------
100.00  0.010000    263        38        3          semtimedop
0.00    0.000000    0          213                  times
0.00    0.000000    0          8                    getrusage
0.00    0.000000    0          701                  gettimeofday
0.00    0.000000    0          41                   io_getevents
0.00    0.000000    0          41                   io_submit
0.00    0.000000    0          2                    semop
0.00    0.000000    0          37                   semctl
------ ----------- ----------- --------- --------- --------------
100.00  0.010000               1081      3          total

***io_getevents是在AIO阶段log file parallel write等待事件度量
Redo,commit相关的latch tuning
1.redo allocation latches-故名思议,在私有现成写redo到log buffer时保护分配空间的latch
2.redo copy latches-当从私有内存区域copy redo到log buffer时需要的latch直到相关redo流被copy到log buffer，,那么LGWR进程
直到已经copy完成可以写buffers到磁盘,此时LGWR将等待LGWR wait for redo copy事件,相关的可以被调整的参数:_log_simultaneous_copies
等待事件：
1.log file sync
2.log file parallel write
3.log file single write
可以获取相关的统计信息(v$sesstat,v$sysstat)

理解LGWR,Log File Sync Waits以及Commit的性能问题

悦读