Couple of weeks ago we had a problem with one of our busiest databases.
The FRA was filling quite rapidly and we just could not free enough space. What made this problem interesting was the fact that Oracle was telling us more than 40% of the space was marked as reclaimable.
The problem:
We have automatic alerts when we are running out of space. When I had a look to the situation this is what I saw:
SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
------------ ------------------ ------------------------- ---------------
CONTROLFILE .01 0 1
ONLINELOG .89 0 9
ARCHIVELOG 51.29 49.67 1181
BACKUPPIECE 0 0 0
IMAGECOPY 0 0 0
FLASHBACKLOG 40.43 34.83 1658
My first thought was to remove old files using RMAN, after checking all standby databases were in sync I execute the following command (we have space enough to store 7 days worth of archivelogs, but I wanted to play safe as we had many changes during the weekend and one change in particular, generated 60+ GB of archivelogs):
rman
RMAN> connect target /
RMAN> delete noprompt archivelog until time 'SYSDATE - 3';
List of Archived Log Copies
Key Thrd Seq S Low Time Name
------- ---- ------- - ------------------- ----
228321 1 71005 A 07.08.2010 14:27:00 +FRA/xxx/archivelog/2010_08_07/thread_1_seq_71005.670.726416889
228323 1 71006 A 07.08.2010 14:28:09 +FRA/xxx/archivelog/2010_08_07/thread_1_seq_71006.2112.726416957
228330 1 71007 A 07.08.2010 14:29:16 +FRA/xxx/archivelog/2010_08_07/thread_1_seq_71007.1210.726417025
.....
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on ORA_DISK_1 channel at 08/09/2010 18:19:58
ORA-15028: ASM file '+FRA/xxx/archivelog/2010_08_07/thread_1_seq_71005.670.726416889' not dropped; currently being accessed
bingo, what should have been a simple operation now has just become more complicated…
The solution:
Technically we have plenty of space on the FRA, but Oracle is complaining that a file is being used. This file is old.
The only way I could fix the problem is by killing the process holding a lock on the file. This process has to be one of the arc processes.
In this case the solution was quite simple. The steps I followed were (bearing in mind that arc? processes are being restarted automatically by oracle):
1.- find the process id for arc:
ps -ef | grep -i ora_arc*
oracle 5607 1 1 19:02 ? 00:00:00 ora_arc9_prod1
2.- kill the running process:
kill -9 5607
3.- check the process is started again before killing more:
ps -ef | grep -i ora_arc9_prod1
4.- perform 2 and 3 for all arc? running for your instance.
Once we have killed all arc? processes and we are sure they have been re-started by Oracle I did the following:
1.- Perform couple of redo log switches and make sure are being propagated to all standby databases:
SQL> alter system switch logfile;
2.- Connect to RMAN and perform a backup of the archivelog “being in used”:
rman
RMAN> connect target /
RMAN> backup archivelog from sequence 71005 until sequence 71005 thread 1;
The output is:
Starting backup at 09.08.2010 18:50:02
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=620 instance=prod1 devtype=DISK
channel ORA_DISK_1: starting archive log backupset
channel ORA_DISK_1: specifying archive log(s) in backup set
input archive log thread=1 sequence=71005 recid=228321 stamp=726416895
channel ORA_DISK_1: starting piece 1 at 09.08.2010 18:50:04
channel ORA_DISK_1: finished piece 1 at 09.08.2010 18:50:11
piece handle=+FRA/xxx/backupset/2010_08_09/annnf0_tag20100809t185002_0.1097.726605405 tag=TAG20100809T185002 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:08
Finished backup at 09.08.2010 18:50:11
Starting Control File and SPFILE Autobackup at 09.08.2010 18:50:11
piece handle=/u04/oradata/prod/rman_disk/auto/PROD/c-4255406167-20100809-00_PROD comment=NONE
Finished Control File and SPFILE Autobackup at 09.08.2010 18:50:15
Once we have performed the backup, we can delete old archivelogs to release some space:
RMAN> delete noprompt archivelog until time 'SYSDATE - 3';
....
deleted archive log
archive log filename=+FRA/xxx/archivelog/2010_08_07/thread_1_seq_71005.670.726416889 recid=228321 stamp=726416895
deleted archive log
archive log filename=+FRA/xxx/archivelog/2010_08_07/thread_1_seq_71006.2112.726416957 recid=228323 stamp=726416962
....
3.- Monitor the space on the FRA:
SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
-------------------- ------------------ ------------------------- ---------------
CONTROLFILE .01 .00 1
ONLINELOG .89 .00 9
ARCHIVELOG 49.09 46.83 1168
BACKUPPIECE .00 .00 0
IMAGECOPY .00 .00 0
FLASHBACKLOG 40.22 33.91 1648
couple of times:
SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
-------------------- ------------------ ------------------------- ---------------
CONTROLFILE .01 .00 1
ONLINELOG .89 .00 9
ARCHIVELOG 3.10 .04 225
BACKUPPIECE .10 .00 1
IMAGECOPY .00 .00 0
FLASHBACKLOG 39.91 32.95 1633
As always, comments are welcome.