Note 546006 - Problems with Oracle due to operating system errors

Summary
Symptom
One of the following errors (summarized below as ORA-270xx) appears in the alert log or elsewhere:

ORA-27037: unable to obtain file status
ORA-27041: unable to open file
ORA-27044: unable to write the header block of file
ORA-27045: unable to close the file
ORA-27046: file size is not a multiple of logical block size
ORA-27047: unable to read the header block of file
ORA-27048: skgfifi: file header information is invalid
ORA-27052: unable to flush file data
ORA-27061: waiting for async I/Os failed
ORA-27063: skgfospo: number of bytes read/written is incorrect
ORA-27064: skgfdisp: cannot perform async I/O to file
ORA-27069: skgfdisp: attempt to do I/O beyond the range of the file
ORA-27070: skgfdisp: async read/write failed
ORA-27072: skgfdisp: I/O error
ORA-27091: skgfqio: unable to queue I/O
ORA-27102: out of memory
ORA-27123: unable to attach to shared memory segment

These messages are usually accompanied by more detailed error messages that contain information about the file in which the problem occurred and how the file was accessed (by selecting read, write or open):

ORA-00202: controlfile: '<controlfile>'
ORA-00205: error in identifying controlfile, check alert log for
          more info
ORA-00206: error in writing (block <block>, # blocks ) of controlfile
ORA-00210: cannot open the specified controlfile
ORA-00221: error on write to controlfile
ORA-00255: error archiving log <lognr> of thread 1, sequence
ORA-00272: error writing archive log
ORA-00308: cannot open archived log '<archivelog>'
ORA-00312: online log <lognr> thread 1: <redolog>
ORA-00313: open failed for members of log group <loggroup> of thread 1
ORA-00321: log <log> of thread 1, cannot update log file header
ORA-00333: redo log read error block <block> count <count>
ORA-00334: archived log: '<archivelog>'
ORA-00340: IO error processing online <lognr> of thread 1
ORA-00345: redo log write error block <block> count <count>
ORA-00346: log member marked as STALE
ORA-01110: data file <file>
ORA-01114: IO error writing block to file <file>
ORA-01115: IO error reading block from file <file>
ORA-01116: error in opening database file <file>
ORA-01157: cannot identify/lock data file <datafile>
ORA-01565: error in identifying file <file>
ORA-19502: write error on file "<file>", block no <block>

KCF: write/open error block=<block> online=1
error=27041 txt: '<errortext>'
error=27047 txt: '<errortext>'
error=27063 txt: '<errortext>'
error=27069 txt: '<errortext>'
error=27070 txt: '<errortext>'
error=27072 txt: '<errortext>'
error=27091 txt: '<errortext>'
Automatic datafile offline due to write error on file <file>
OSD-04002: unable to open file
OSD-04006: ReadFile() failure, unable to read from file
OSD-04008: WriteFile() failure, unable to write to file

The latest error messages are most relevant for you when you analyze the error in your file, for instance:

O/S-Error: (OS 112) There is not enough space on the disk.
<unix> Error: 28: No space left on device
O/S-Error: (OS 1) Incorrect function.
O/S-Error: (OS 121) The semaphore timeout period has expired.
<unix> Error: 4: Interrupted system call
<unix> Error: 9: Bad file number
<unix> Error: 32: Broken pipe
...
Other terms
Additional information.
Reason and Prerequisites
If you cannot access a database file (for instance a data file, control file or RedoLog) for whatever reason, an ORA-270xx appears. The actual cause comes from additional error messages (usually in the operating system) that occur on ORA-270xx.

For the following, use the <unix> placeholder for the name of the operating system that initiated the operating system-dependent error messages:

RELIANT:  SVR4
SOLARIS:  SVR4
HP-UX:    HPUX
AIX:      IBM / AIX RISC / System 6000
LINUX:    Linux
TRU64:    Digital Unix / Compaq Tru64 UNIX

The error messages below that begin with "O/S Error" originate from NT / W2K.

The following describe possible causes, depending on the closing error message:
    1. O/S-Error: (OS 32) The process cannot access the file because it is
              being used by another process.
    O/S-Error: (OS 33) The process cannot access the file because
               another process has locked a portion of the file.
    2. O/S-Error: (OS 112) There is not enough space on the disk.
    <unix> Error: 28: No space left on device
    <unix> Error: 48: Operation not supported
    3. O/S-Error: (OS 2) The system cannot find the file specified.
    O/S-Error: (OS 3) The system cannot find the path specified.
    <unix> Error: 2: No such file or directory
    <unix> Error: 6: No such device or address
    <unix> Error: 52: Missing file or filesystem
    4. O/S-Error: (OS 5) Access is denied.
    <unix> Error: 13: Permission denied
    5. <unix> Error: 27: File too large
    6. O/S-Error: (OS 1) Incorrect function.
    O/S-Error: (OS 6) The handle is invalid.
    O/S-Error: (OS 21) The device is not ready.
    O/S-Error: (OS 23) Data error (cyclic redundancy check)
    O/S-Error: (OS 31) A device attached to the system is not functioning.

    O/S-Error: (OS 121) The semaphore timeout period has expired.
    O/S-Error: (OS 170) The requested resource is in use.
    O/S-Error: (OS 1117) The request could not be performed because
              of an I/O device error.
    O/S-Error: (OS 1392) The file or directory is corrupted and
              unreadable.
    <unix> Error: 4: Interrupted system call
    <unix> Error: 5: I/O error
    <unix> Error: 19: No such device
    <unix> Error: 22: Invalid argument
    <unix> Error: 78: Connection timed out
    <unix> Error: 83: System call timed out
    <unix> Error: 125: Unknown error
    OSD-04004: invalid file header
    OSD-04016: Error queuing an asynchronous I/O request.
    OSD-04023: SleepEx() failure, unable to Sleep
    7. <unix> Error: 14: Bad address
    8. O/S-Error: (OS 1450) Insufficient system resources exist to complete
    the requested service.
    9. <unix> Error: 25: Inappropriate ioctl for device
    10. <unix> Error: 23: File table overflow
    11. <unix> Error: 24: too many open files
    12. <unix> Error: 12: Not enough space
    O/S-Error: (OS 8) Not enough storage is available to process this
               command
    13. O/S-Error: (OS 38) Reached end of file.
    14. <unix> Error: 11: Resource temporarily unavailable
    15. OSD-04026: Invalid parameter passed.
    16. OSD-04012: file size mismatch (OS <number>)
    17. <unix> Error: 9: Bad file number
    18. <unix> Error: 38: Function not implemented
    19. <unix> Error: 124: Unsupported attribute value
    20. <unix> Error: 32: Broken pipe
Solution
    1. This is caused by an external Oracle process that blocks a file that Oracle wants to access. Check whether external software such as virus scanners, system monitoring tools or backup tools were active when the error occurred and avoid running these tools when you are starting Oracle.
    2. This is caused by a file system overflow. Create sufficient space in the file system to correct the error. If the error occurs in connection with offline redo logs, this is an archiver stuck problem (see Note 391). Since control files increase automatically in certain situations, they can also experience space problems. The same applies for data files which extend automatically as a result of AUTOEXTEND.
              If the problem with a TEMPFILE of a LMTS/T-PSAPTEMP (see Note 659946) occurs, the sparse-file property of the TEMPFILEs is ignored. In this case, see Note 548221 and ensure that adequate space is available for the growth of the sparse files.
    3. Check whether you can currently access the directory that contains the file. Also, check whether the file in question exists. In exceptional cases, a file system overflow can also cause this error.
              To recover the missing files, see Note 491160.
    4. Check whether the authorizations for the file are set correctly (NT, W2K: "Full Control for Everyone"). The error can also occur if another process, such as external backup tool or virus scanner is blocking the file, or if a block of this type of process is hanging after the process terminates. On UNIX, you should also check the Oracle Executable for correct authorizations (6751) in $ORACLE_HOME/bin (Note 583861). If the error occurs when you execute the BR*TOOLS, check whether it has the correct authorizations (Note 651351).
    5. Check whether the user limit for ora<sid> and <sid>adm is dimensioned sufficiently (for example, "ulimit -a"). In addition, you must also implement other operating system settings (for example, activating large files on HP-UX if files larger than 2 GB are to be used). See also Note 129439. If necessary, contact your operating system partner.
    6. In almost all cases, this error is caused by hardware or operating system problems. Check the system, together with your hardware partner and operating system partner, to correct the cause of the problem. Then carry out a consistency check on the entire database, as described in Note 23345, remove any inconsistencies using Note 365481 and restore the affected files from a backup.
              If Oracle crashes on AIX with the error "Invalid argument" and ora-27061 during the operation run, this may be caused by an AIX bug. Check whether the APAR IY15138 that corrects this bug is installed. If you use AIX 5.3.0.60 or 5.3.0.61, the installation of APAR IZ03260 may help (see MetaLink Note 452866.1).
              If the error "Invalid argument" occurs in connection with ORA-27123 when starting the database, the SGA is larger than permitted by the operating system. Either reduce the SGA size (shared_pool_size, db_block_buffers) or set the operating system limit to a higher value.
              If the FILESYSTEMIO_OPTIONS parameter was converted online to SETALL, an ORA-27041 error may occur with "Invalid argument" when you use external tools (BR*TOOLS, SQLPLUS, and so on) on AIX. In this case, stop the database and restart it. After that, the problem should be eliminated.
              If this error occurs with Oracle 9i or 10.2. 0.2 and AIX 5L V5.3 in connection with ORA-27091 or ORA-27072, see Notes 980152 and 977410 and implement the corrections in AIX and/or Oracle.
    7. If you are using Oracle 8.0.4 or 8.0.5 on HP-UX, the error described in Note 138816 may be the cause. Import Oracle 8.0.4.4 or 8.0.5.2 or higher.
              If the error occurs on AIX, use Oracle 8. 1 or higher to exclude a known bug that occurs when Oracle and AIX run simultaneously (metalink document 1058718.6).
              Also check if there is a file system overflow and create sufficient space.
    8. Check the CPU load on the DB server (for example, by using transaction ST06). The idle part should not be below a 30% average per hour. Check if there are single processes that consume a lot of CPU. The problem may be caused by an active file compression on NT, for example. Deactivate such external resource consumers. Always ensure that sufficient CPU resources are available.

    If the problem persists in certain transactions, database access may be responsible for the high utilization, perhaps due to an Oracle bug. Check if CPU consumption increases considerably when you execute the transaction, and optimize the database accesses if required.
    9. On SOLARIS, the error can be caused by a bug in Oracle Release 8.0.5 or lower if you are using data files that are larger than 2 GB. To solve this problem, upgrade to a more recent Oracle release (see Note 201302).
              If the error occurs with RMAN during a backup, this is an Oracle bug relating to data files that are larger than 4 GB, and which has been corrected as of Oracle 8.0.
              In exceptional cases, this  error is caused by a hardware problem or a file system overflow.
    10. This is caused by an overflow in the file table. Increase the size of the corresponding UNIX kernel parameter (for example, NFILE on HP-UX). Note here that the following formula applies for the maximum number of file descriptors used simultaneously by Oracle:

      #File descriptors = #Data files * processes
              "Processes" here means the init<sid>. ora parameter processes. Therefore, with 100 processes and 50 data files, for example, you get a minimum value of 5000 for NFILE on HP-UX. For large installations with several hundred data files, there may also be values of around 100,000. Note 172747 contains further information for HP-UX. Contact your operating system partner for any questions you may have regarding the operating system.
    11. Use lsof or a corresponding operating system command to check which files are presently open and whether there are any noticeable problems.
              Increase the operating system parameter used to define the maximum number of file descriptors per process (for example rlim_fd_max, rlim_fd_cur on SOLARIS), if it has not been sufficiently defined.
              Contact your operating system partner to eliminate possible bugs in the operating system.
              If the error occurs during an export with Oracle 9.2.0.1, this is caused by an Oracle bug. In this case, import a current patch set.
              The problem may also be caused by the fact that the Oracle parameter file repeatedly reads itself recursively because a parameter such as IFILE, PFILE, or SPFILE refers to the file itself.
    12. Check whether the main memory and swap are sufficiently dimensioned. If this is not the case, either reduce the R/3 and Oracle buffer or increase the main memory or swap.
              The error may be caused by hanging shared memory areas from an Oracle instance that was stopped incorrectly. Check and correct this (for example, using ipcs and ipcrm).
              Also check whether your system is running on a UNIX limit (such as the number of file descriptors) and correct the UNIX parameter settings if necessary.
              On TRU64, the per-proc-stack-size and max-per-proc-stack-size parameters may be important because neither of them should be set higher than 512 MB.
              The configuration of shared memory by using, for example, SHMMIN, SHMMAX, SHMMNI or SHMSEG may also be the cause of the problem.
              Check the ulimit settings of the users used, such as ora<sid> and <sid>adm.
              See also Note 743328.
              If in doubt, contact your operating system partner.
    13. The file size is incorrect. In most cases, previous hardware problems cause this error. Check this together with your hardware partner. When you have corrected the hardware problem, you must create the original version of the file again (for example, using restore/recovery for data files).
    14. Check the parameter settings for the operating system and also consult your operating system partner if necessary. On AIX, see Note 210385.
    15. If the error occurs during the migration to Oracle 8.0, see Note 137707.
              If the error occurs with Oracle 8.1. x on NT/W2K while using AUTOEXTEND/RESIZE, see Note 482435.
              On NT/W2K with Oracle 8.1. x, the problem can also occur because Oracle software has been installed incorrectly or because obsolete patches (Version 8.1.7.2 or lower) are being used. To avoid this type of problem, you should uninstall the Oracle software using the Universal Installer. Then, in the correct sequence, install the Oracle basis release, the current patch set and, if necessary, the required hot fixes again. For more information about this see the corresponding patch set note (for example Note 362060 for Oracle 8.1.7).
              If the database has been installed with Oracle 7.2 or earlier and you want to upgrade to Oracle 8.1.7 on NT / W2K, see Note 564783.
    16. When a control file is affected and there is a stopped database, you can copy one of the other control files available (control_files parameter in init<sid>.ora) there.
              If the error occurs after a restore, this may be caused by an incorrect backup or a software error. Therefore, always obtain the latest patches for BR tools as described in Note 12741.
              If data files are affected, check in the Oracle alert log whether errors occurred with a RESIZE or with an AUTOEXTEND. If you do find errors there, investigate them further.
Otherwise, check your hardware using the command. Using

  SELECT BYTES FROM V$DATAFILE WHERE NAME = '<datafile>';
              you can check if the file size Oracle expects is the same as the actual existing data file at operating system level. The size of the data files at operating system level must always be 8192 bytes bigger than the value in V$DATAFILE. If required, restore the affected data files from a good backup and recover them.
    17. Check the authorizations of the Oracle Executables in Note 583861. Also check the authorizations of the affected file - it must have the owner ora<sid> and must be readable and writable for this user.
    18. This error may occur if you start Oracle on LINUX with the XFS file system. XFS is supported in connection with Oracle as of SUSE SLES 10 Support Package 1. See also SAP Note 1114181.  However, XFS is not supported for Oracle for Red Hat Enterprise Linux.
    19. This error may occur on AIX if FILESYSTEMIO_OPTIONS is set to SETALL. Check whether the problem can be corrected by setting the CIO mount option at file system level (Note 999524). For a permanent solution to this error, create a customer message for SAP.
    20. <unix> Error: 32: Broken pipe:
    Oracle Metalink Note 550859.1 contains notes about analyzing the error - "TROUBLESHOOTING GUIDE TNS-12518 TNS listener could not hand off client connection"


If the specific notes have not corrected the problem, we recommend that you perform the following steps:
  • Create an SAP message.
  • Check your operating system log (NT event log, UNIX syslog).
  • Run a database consistency check as described in Note 23345.
  • Check your hardware.
  • Contact your hardware and operating system partners.


As a last option, we recommend that you perform a restore/recovery of the relevant data files.
However, you should always consult Oracle Development Support before you try this option.

Header Data
Release Status:Released for Customer
Released on:12.02.2010  13:39:43
Master Language:German
Priority:Recommendations/additional info
Category:Help for error analysis
Primary Component:BC-DB-ORA Oracle
Affected Releases
Release-Independent
Related Notes

 
980152 - DB reports I/O error after inst of AIX 5.3 TL 05, 06, 07, 08
 
977410 - Oracle error after upgrade to AIX 5.3 TL05
 
743328 - Composite SAP note: ORA-27102
 
659946 - FAQ: Temporary tablespaces
 
651351 - BR tools on UNIX: Error due to executable permissions
 
583861 - UNIX: Errors due to Oracle executable
 
564783 - Addit. Info. for Win NT/2000 for upgrade to Oracle 8.1.7.4
 
548221 - Temporary Files are created as sparse files
 
491160 - Restore scenarios for lost files of oracle databases
 
365481 - Block corruptions
 
328785 - ORA-00376: File cannot be read at this time
 
210385 - AIX 4.3.3: error: 11: Resource temporarily unavailable
 
201302 - ORA-1115/27072 I/OError on Solaris f. data file>2GB
 
172747 - SAP on HP-UX: OS kernel parameter recommendations
 
138816 - Oracle Archiver stuck on HP/UX 11
 
137707 - ORA-27069 for convert.sql on NT-INTEL
 
129439 - Maximum file sizes with Oracle
 
23345 - Consistency check of ORACLE database
 
12741 - Current versions of BR*Tools
 
8812 - Increasing max. number of data files in Oracle

No comments: