Sometimes an ASM instance hangs for no apparent reason and this causes problems when backing up the ASM Metadata. Running queries against V$ASM_DISK and similar views may also hang. This blog post should go some way to helping diagnose the problem, and providing a fix.
ASM metadata backups on a couple of our servers had been failing, the backups were run from a system called CommVault and the job scheduler there showed that they simply sat at 0% forever, or would have if we allowed them! There were no error messages or codes to speak of - the job simply sits in CommVault at 0% and never ends.
This was tracked down to the md_backup command being run from asmcmd. Running the command manually also just hung and the session had to be killed to release it.
Using sqlplus on the ASM instance and attempting to query V$ASM_DISK or other similar ASM views, also hung.
Looking at V$SESSION in the ASM instance, with the following query shows the problem:
| |
Almost all hung sessions are waiting for sid 254. Sid 254 is itself waiting on 167 which is not waiting on a session, but on the GPnP Get Item event.
A search of MOS shows that this is caused by an unpublished bug. Note 1375505.1 which mentions killing the gpnpd.bin process with a HUP, which will cause it to immediately restart, refers the reader to note 1392934.1 for full details. That latter note simply says:
| |
Full details indeed! There’s not even a pid to be killed.
In our specific case, the following was required:
| |
It can be seen that this is safe, according to Oracle, and the gpnpd.bin process will be automatically restarted - even on production systems!
| |
It can be seen from the above that the daemon is running and has a new pid and start time. If we check in the database again, there will be no waiting sessions and the ASM Metadata backups will work, as will querying V$ASM_DISK etc.