Sometimes an ASM instance hangs for no apparent reason and this causes problems when backing up the ASM Metadata. Running queries against V$ASM_DISK
and similar views may also hang. This blog post should go some way to helping diagnose the problem, and providing a fix.
ASM metadata backups on a couple of our servers had been failing, the backups were run from a system called CommVault and the job scheduler there showed that they simply sat at 0% forever, or would have if we allowed them! There were no error messages or codes to speak of - the job simply sits in CommVault at 0% and never ends.
This was tracked down to the md_backup
command being run from asmcmd
. Running the command manually also just hung and the session had to be killed to release it.
Using sqlplus on the ASM instance and attempting to query V$ASM_DISK
or other similar ASM views, also hung.
Looking at V$SESSION
in the ASM instance, with the following query shows the problem:
|
|
Almost all hung sessions are waiting for sid 254. Sid 254 is itself waiting on 167 which is not waiting on a session, but on the GPnP Get Item
event.
A search of MOS shows that this is caused by an unpublished bug. Note 1375505.1 which mentions killing the gpnpd.bin
process with a HUP, which will cause it to immediately restart, refers the reader to note 1392934.1 for full details. That latter note simply says:
|
|
Full details indeed! There’s not even a pid to be killed.
In our specific case, the following was required:
|
|
It can be seen that this is safe, according to Oracle, and the gpnpd.bin
process will be automatically restarted - even on production systems!
|
|
It can be seen from the above that the daemon is running and has a new pid and start time. If we check in the database again, there will be no waiting sessions and the ASM Metadata backups will work, as will querying V$ASM_DISK
etc.