今天一8点跑去中金校验数据,发现近期库(AIX6.1下4节点11gR2 RAC)的归档满了,数据库hang住,跑去问提前到的中间件的哥们,结果来了一句没发现什么异常……
心凉了一截,这他妈我要是晚来一会,准出事啊,纳税人还不得急死……二话不说赶紧去先清清再说,切换到grid用户下,通过 asmcmd 用 os 命令连删除了两个文件夹
结果删到第二个文件夹时,突然报错:
ORA-15032: not all alterations performed
ORA-15028: ASM file '+FRA/bjschxcx/……' not dropped; currently being accessed (DBD ERROR: OCIStmtExecute)
ls 命令核查,发现只有一个文件未删除,数据库已经从 hang 机状态恢复了,尝试用 RMAN 删除,仍然报如下错误:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on default channel at 06/08/2012 13:20:35
ORA-15028: ASM file '+FRA/bjschxcx/……' not dropped; currently being accessed
我要删除的归档是好几天钱的了,当前按道理应该没有使用才对,即便是近期库上配置了好几家厂商的 GoldenGate 实例,数据库在释放一点归档空间后虽然成功
起来了,但是这个问题不解决也不是个事,我在几家厂商的 GoldenGate 实例上查了一下,都未用到我要删除的归档日志,而且进程都没有延迟。
查阅了下 metalink ,有 2、3 篇文章对此现象有描述
其中一篇描述如下,肯定不符合我这里遇到的场景,首先排除、
Cause
The issue can be caused by any replication process running or hanging, holding this file.
For example a Golden Gate replication or shareplex replication process.
Solution
Stop the replication process and try deleting the file uisng rman or ASMCMD.
另外两篇如下:
Cause: An attempt was made to drop an ASM file, but the file was being
accessed by one or more database instances and therefore could not
be dropped.
Action: Shut down all database instances that might be accessing this
file and then retry the drop command.
Solution
Use the following to quickly find out which database instance holds the lock and to identify for restart:
ASMCMD [+] > lsof -G DG_ARCH
DB_Name Instance_Name Path
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_1_seq_72711.5178.785032231
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_1_seq_72720.4818.785040307
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_1_seq_72727.4616.785046605
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_1_seq_72730.4479.785049261
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_1_seq_72742.4395.785059089
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_2_seq_70382.2308.785047531
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_2_seq_70385.1835.785050225
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_2_seq_70402.3091.785064485
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_2_seq_70408.1211.785069875
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_2_seq_70410.4439.785071661
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_3_seq_67973.4354.785051059
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_3_seq_67974.2051.785051959
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_3_seq_67995.1876.785069891
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_6_seq_32158.4060.785046539
myprod myprod1 +dg_arch/myprod/datafile/tbs_hways_master_data_med.1698.750777283
这两篇可能和我这个场景有点像,按照第三篇文章中的例子,我在 asmcmd 中也执行了 lsof -G FRA,但是命令执行后无任何输出
不知道是不是我这里的版本不合适,我所在的环境数据库版本为 11.2.0.3.0,而这篇文章写的我的数据库版本也应该是适用的。
Oracle Server - Enterprise Edition - Version 11.2.0.2 and later
Information in this document applies to any platform.
9点左右的时候,我的另一位同事,准备看看 lsof 命令的帮助信息,结果真他妈受伤,他直接输入 lsof
结果,上述所示的输出便出现了……
结果是实例 4 锁住了那个归档日志,中午重启节点4后,执行 RMAN 命令便可正常删除归档了
转载请注明出处及原文链接:
http://blog.csdn.net/xiangsir/article/details/8679761