This page looks best with JavaScript enabled

Asmcmd or ASM Instance Backups or Queries Hang

 ·  ☕ 3 min read

    Sometimes an ASM instance hangs for no apparent reason and this causes problems when backing up the ASM Metadata. Running queries against V$ASM_DISK and similar views may also hang. This blog post should go some way to helping diagnose the problem, and providing a fix.

    ASM metadata backups on a couple of our servers had been failing, the backups were run from a system called CommVault and the job scheduler there showed that they simply sat at 0% forever, or would have if we allowed them! There were no error messages or codes to speak of - the job simply sits in CommVault at 0% and never ends.

    This was tracked down to the md_backup command being run from asmcmd. Running the command manually also just hung and the session had to be killed to release it.

    Using sqlplus on the ASM instance and attempting to query V$ASM_DISK or other similar ASM views, also hung.

    Looking at V$SESSION in the ASM instance, with the following query shows the problem:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    
    set lines 350 trimspool on pages 300
    
    select sid, state, event, seconds_in_wait, blocking_session
    from   v$session
    where  blocking_session is not null
    or sid in (select blocking_session 
               from   v$session 
               where  blocking_session is not null)
    order by sid;
    
           SID STATE    EVENT                 SECONDS_IN_WAIT BLOCKING_SESSION
    ---------- -------- --------------------- --------------- ----------------
            15 WAITING  enq: DD - contention            73683              254
            16 WAITING  enq: DD - contention            15692              254
            17 WAITING  enq: DD - contention           117109              254
            93 WAITING  enq: DD - contention            61107              254
            95 WAITING  enq: DD - contention           242327              254
            96 WAITING  enq: DD - contention            68731              254
           167 WAITING  GPnP Get Item                 2471652
           172 WAITING  enq: DD - contention           117109              254
           173 WAITING  enq: DD - contention           147026              254
           176 WAITING  enq: DD - contention            37787              254
           177 WAITING  enq: DD - contention           658138              254
           178 WAITING  enq: DD - contention            42238              254
           251 WAITING  enq: DD - contention           315711              254
           253 WAITING  enq: DD - contention            97075              254
           254 WAITING  rdbms ipc reply                     0              167
           255 WAITING  enq: DD - contention             4140              254
           257 WAITING  enq: DD - contention           521537              254
    
    17 rows selected.
    

    Almost all hung sessions are waiting for sid 254. Sid 254 is itself waiting on 167 which is not waiting on a session, but on the GPnP Get Item event.

    A search of MOS shows that this is caused by an unpublished bug. Note 1375505.1 which mentions killing the gpnpd.bin process with a HUP, which will cause it to immediately restart, refers the reader to note 1392934.1 for full details. That latter note simply says:

    1
    
    kill -HUP 
    

    Full details indeed! There’s not even a pid to be killed.

    In our specific case, the following was required:

    1
    2
    3
    4
    5
    6
    7
    8
    
    ps -ef | grep -i g\[p\]npd
    
    grid      4084     1  0  Jul 15  ?        04:37:48 /app/gridsoft/11.2.0.3/bin/gpnpd.bin
    
    su - grid
    Password: ******
    
    kill -HUP 4084
    

    It can be seen that this is safe, according to Oracle, and the gpnpd.bin process will be automatically restarted - even on production systems!

    1
    2
    3
    
    ps -ef | grep -i g\[p\]npd
    
    grid     19015     1 14 09:23:10 ?        00:00:00 /app/gridsoft/11.2.0.3/bin/gpnpd.bin
    

    It can be seen from the above that the daemon is running and has a new pid and start time. If we check in the database again, there will be no waiting sessions and the ASM Metadata backups will work, as will querying V$ASM_DISK etc.

    Share on

    Norman Dunbar
    WRITTEN BY
    Norman Dunbar
    Oracle DBA & developer. (Retired). Now a published book author!