Note 521264 - Hang situations

Summary
Symptom
The Oracle database hangs when you start or stop the system, or while the system is running.
Other terms
Hang situations, hang, hangs, hanging, loops, loop, brbackup, brarchive, brrestore, brconnect, brtools
Reason and Prerequisites
In various situations, the database may hang for a number of reasons. Since troubleshooting is frequently difficult due to the absence of error messages, this note describes possible reasons for this problem, and also provides general hints for troubleshooting.

Whenever the database hangs, check to see if the system issued informative error messages; in particular, check the Oracle alert log. Frequently, you can use these error messages (for example, archiver stuck as described in Note 391) to accurately identify the cause of the problem. Therefore, this note mainly describes database hang situations for which there are inadequate error messages or no error messages at all.

In addition to the hang situations described here, you must also check whether there is a hardware problem or a problem with the operating system, as these may also cause hang situations (for example, if the controller no longer writes data to the hard disk).

Under certain circumstances, serious performance problems may cause the database to hang temporarily, or they may at least give the user the impression that the database is hanging. This note does not describe the causes for these problems. Instead, refer to the composite SAP performance Note 354080, which describes general performance problems. For example, the problem described in Note 488583 may cause the database to hang temporarily.

Known hang situations:
    1. Logging on to the database:
      a) WINDOWS: DLL problems
      b) HP-UX 11 (64-bit): Missing HP-UX patches
      c) Oracle 10g: Active resource plan
    2. Starting the database:
      a) HP-UX: The startup hangs when it moves to the NOMOUNT phase.
      b) SOLARIS, RELIANT: The startup hangs when it moves to the MOUNT phase.
      c) UNIX: The startup hangs when it moves to the OPEN phase.
      d) WINDOWS, Oracle 9i: It takes the database a very long time to start.
      e) TRU64, Oracle 9i: The startup hangs when it moves to the OPEN phase.
    3. While the system is running:
      a) Changes are no longer possible and there is no log file switch.
      b) Oracle 8.1.7.3 or higher: You can no longer access the database - the CPU load is 100%.
      c) WINDOWS with AWE: You can no longer access the database - the CPU load is high.
      d) WINDOWS Itanium, Oracle 9i: Hang situations last for seconds or minutes.
    4. Stopping the database:
      a) NT / W2K: "shutdown normal" hangs.
      b) W2K, Oracle 8.0.6: "shutdown immediate" / "shutdown normal" hangs.
      c) "shutdown immediate" hangs.
      d) "shutdown immediate" / "shutdown normal" hangs.
      e) "shutdown immediate" / "shutdown normal" hangs.
      f) Oracle 8 (8.1.6 or lower), SOLARIS / RELIANT: "shutdown immediate" / "shutdown normal" hangs.
      g) NT, Oracle 8.0: Shutdown hangs.
      h) Oracle 9i 9.2.0.5 or lower, SOLARIS: "shutdown immediate" / "shutdown normal" hangs.
      i) Oracle 9.2.0.5, Security Alert #68: "shutdown immediate" / "shutdown normal" hangs.
      j) Oracle 10g 10.2.0.2 or lower: "shutdown immediate" / "shutdown normal" hangs.
      k) Oracle 9.2.0.8 and higher and Oracle 10.2.0.2 and higher: "shutdown immediate" / "shutdown normal" hangs.
      l) Oracle 10g: "shutdown immediate" / "shutdown normal" hangs.
      m) Oracle 9.2.0.7, 9.2.0.8, 10.2.0.2, 10.2.0.4: "shutdown immediate" / "shutdown normal" hangs.
You can use the ORADEBUG Oracle tool to analyze a hang situation in accordance with Note 613872.
Solution
    1. Logging on to the database:
      a) If you cannot start work processes because they hang when you log on to the database, this may be due to an incorrect Oracle DLL. For more information, see Note 527525.
      b) If tools such as SAPDBA or the BR*Tools hang when you log on to the database, check the HP-UX patches in accordance with Note 431881.
      c) Check in accordance with Note 619188 whether "resmgr:become active" waits occur. If required, switch off the resource manager as described in the relevant section of Note 619188.
    2. Starting the database:
      a) If Async I/O is used on HP-UX, Oracle may hang with "startup nomount". The following error message is displayed after two minutes:

        ORA-00445: background process "PMON" did not start after 120 seconds
                       This problem is due to a missing MLOCK privilege for the dba group and the oper group. First use "getprivgrp dba" and "getprivgrp oper" to check whether this privilege exists. If it does not exist, you can assign it as a root either temporarily by using "setprivgrp dba MLOCK"/"setprivgrp oper MLOCK" or permanently by creating the /etc/privgroup file with the following contents:

  dba MLOCK
  oper MLOCK
      b) If Oracle hangs on SOLARIS or RELIANT when it moves to the MOUNT phase ("alter database mount" is the last entry in the alert log), the bug described in Note 329741 is probably the trigger if the system has not been rebooted in the last 24.8 or 248 days.
      c) If Oracle hangs when it moves from status MOUNT to OPEN, but a message is not issued, the problem may be caused by data files that incorrectly contain an S bit in the access authorizations. Check the authorizations for the data files. If an "S" appears anywhere (for example, "rw-r-S---"), reset it using chmod (for example, "chmod 640 <file>" for "rw-r-----").
      d) See Note 656809 and make sure that the background_dump_dest parameter is set on an existing directory (SAP standard: <drive>:\oracle\<sid>\saptrace\background).
      e) The problem may be triggered by incorrect authorizations for the (optional) /dev/timedev device. Make sure that an Oracle user can also read the device when you execute the following command as a root user:

      chmod +r /dev/timedev
                       This problem is also indicated by the fact that the SMON process consumes 100% CPU while the database hangs. In addition, trace files are written with "Waiting for smon to enable cache recovery".
    3. While the system is running:
      a) In addition to the archiver stuck problem (Note 391) and a "Checkpoint not complete" error (Note 79341), which is easily identified by the entries in the alert log, the database hang situation may also be caused by the log_archive_start parameter: if the parameter is set to "false" even though the archive log mode is activated, the archiver process does not run and the online redo logs to be archived are not saved. The log writer then refuses to perform a log switch if it encounters one of the (unsaved) online redo logs. Set log_archive_start to "true" and restart the database to correct this problem. If you do not restart the database, you can also temporarily use an "archive log start" with SQLPLUS.
      b) See Note 515080 and check that the dimensions of the shared_pool_size are sufficient. Note 514758 contains the relevant bug fixes. Notes 505246 and 507254 contain other instances of the same bug.
      c) The database may hang if you use Address Windowing Extensions (AWE) and you define an AWE_WINDOW_MEMORY that is too small. For more information, see Note 603041.
      d) See Notes 904662 and 908727.
    4. Stopping the database:
      a) See Note 445275 and check if the Oracle Intelligent Agent is still running.
      b) The Oracle 8.0.6/W2K combination is not supported because, among other things, it does not facilitate a proper system shutdown (see Notes 156548 and 407314). Switch to a supported environment. In this context, the alert log frequently contains a message "Waiting for detached processes to terminate".
      c) Check the alert log to see whether redo logs are still written during the shutdown. If so, the hang situation is caused by a long rollback because a "shutdown immediate" rolls back all active transactions before stopping the database. A rollback can take as long to run as the transaction beforehand. The alert log often contains the entry "Waiting for smon to disable tx recovery". If you cannot perform any maintenance until the rollback is completed, you must stop the database with "shutdown abort". After you restart the system, it continues the rollback in the background.
      d) A hanging shutdown can also occur in connection with very large SMON process space transactions. In this case, refer to the "TYPE = ST" section from Note 745639.
      e) When you use table monitoring, the shutdown may hang due to an access to MON_MODS$. For more information, see Notes 604176 and 528527.
      f) If you are no longer able to stop the database on SOLARIS or RELIANT in the normal way, the bug described in Note 329741 may be responsible if the system has not been rebooted in the last 24.8 or 248 days.
      g) See Note 128726 and implement one of the suggested workarounds or switch to a more recent Oracle release. There are also similar problems with Oracle 8.0.6.0 and 8.0.6.1. In this context, the alert log frequently contains a message "waiting for detached processes to terminate".
      h) If the system runs for more than 497 days without being rebooted, an internal time value is no longer incremented, which may result in various errors including a hang shutdown (bug 3427424). Therefore, reboot your system if it has already run for 497 days and the shutdown hangs, or import a more recent patch set when this becomes available.
      i) After you import Security Alert #68 on Oracle 9.2.0.5, a hang shutdown may occur if TCP.VALIDNODE_CHECKING is used (Note 186119) and if there are invalid IP addresses in TCP.INVITED_NODES. In this case, import Oracle 9.2.0.6 or higher or correct the setup of TCP.VALIDNODE_CHECKING.
      j) If the shutdown hangs with Oracle 10g 10.2.0.2 or lower, and if the alert log contains messages such as "PMON failed to acquire latch, see PMON dump", this is Oracle bug 4675523. This is fixed with Oracle 10.2.0.3 or higher.
      k) If you use Event 10626 in accordance with Nite 869521, and if you also use Oracle 9.2.0.8 or higher or Oracle 10.2.0.2 or higher, a shut down may hang for a longer period of time since the clean-up after a REBUILD or CREATE attempt could not yet be completed. You find the following entries in the alert log:

      WARNING: event:10626 is set. pid:176 in online index build cleanup loop
                       If the delay during shut down poses a problem (it could take longer than 30 minutes), you should consider using Event 10629 instead of Event 10626.
      l) If you have activated the process DBCONSOLE in a way that is not in accordance with the SAP standard, ensure that you deactivate it.
      m) If the shutdown hangs and

      Active call for process <pid> user 'ora<sid>' program ...
      SHUTDOWN: waiting for active calls to complete.
                       is logged in the alert log, this may occur in connection with Oracle bugs that lead to a loop of active processes (for example, the 10g parsing bugs 6795880 and 8575528). Therefore, make sure that the patch level and parameter level are as current as possible. To restrict more precisely, you can create a stack trace for the hanging process using "oradebug short_stack" as described in SAP Note 613872, and thereby identify the active source code areas.
      n) For more information, see Note 1120875.
Header Data
Release Status:Released for Customer
Released on:10.05.2012  11:09:32
Master Language:German
Priority:Recommendations/additional info
Category:Help for error analysis
Primary Component:BC-DB-ORA Oracle
Secondary Components:BC-DB-ORA-DBA Database Administration with Oracle
Affected Releases
Release-Independent
Related Notes

 
1120875 - Shutdown immediate may take a long time > Oracle 9208/10202
 
908727 - Oracle 9 hangs on Windows 2003 64-bit and Service Pack 1
 
904662 - Database standstill on Oracle Rel >= 9.2.0.5 and MS 64-bit
 
869521 - Oracle <= 10g: TM locks with REBUILD ONLINE / CREATE ONLINE
 
825653 - Oracle: Common misconceptions
 
745639 - FAQ: Oracle enqueues
 
659946 - FAQ: Temporary tablespaces
 
656809 - DB appears to hang if background_dump_dest incorrect
 
618868 - FAQ: Oracle performance
 
613872 - Oracle traces with ORADEBUG
 
604176 - Performance deteriorates due to activated table monitoring
 
603041 - High CPU consumption and AWE
 
557447 - ORA-4031 After upgrade to Oracle 8.1.x.x
 
528527 - Shutdown immediate takes longer than expected (>10 min)
 
515080 - Importing Support Packages hangs. Work process with SEM -17
 
514758 - ORA-4031 and database hang after upgrade to 8.1.7.3
 
507254 - ORA-4031 and ORA-3113 when processing large IN lists
 
445275 - shutdown normal does not work (NT)
 
441802 - Reorganization with sapdba hangs during freespace check
 
431881 - SAPDBA and BR tools hang on HP-UX 11 64-bit
 
407314 - SAP kernel 6/7.x ORACLE: Released operating systems
 
354080 - Note collection for Oracle performance problems
 
329741 - Shutdown/startup hangs aft. 248 or 24,8 days uptime
 
214995 - Oracle locally managed tablespaces in the SAP environment
 
186119 - Restricting DB access to specific hosts
 
156548 - Released operating systems for SAP kernel 4.6x ORACLE
 
128726 - Hanging off-line Backup Oracle 8.0.4/8.0.5 on NT
 
79341 - Checkpoint not complete
 
391 - Archiver stuck

No comments: