Monitoring the Operation of Your Database - Oracle 10g

It is important that you monitor the operation of your database on a regular basis. Doing so not only informs you about errors that have not yet come to your attention but also gives you a better understanding of the normal operation of your database. Being familiar with normal behavior in turn helps you recognize when something is wrong.

This section describes some of the options available to you for monitoring the operation of your database.

A server-generated alert is a notification from the Oracle Database server of an impending problem. The notification may contain suggestions for correcting the problem. Notifications are also provided when the problem condition has been cleared.

Alerts are automatically generated when a problem occurs or when data does not match expected values for metrics, such as the following:

• Physical Reads Per Sec
• User Commits Per Sec
• SQL Service Response Time

Server-generated alerts can be based on threshold levels or can issue simply because an event has occurred. Threshold-based alerts can be triggered at both threshold Monitoring the Operation of Your Database warning and critical levels. The value of these levels can be customer-defined or internal values, and some alerts have default threshold levels which you can change if appropriate. For example, by default a server-generated alert is generated for tablespace space usage when the percentage of space usage exceeds either the 85% warning or 97% critical threshold level. Examples of alerts not based on threshold levels are:

• Snapshot Too Old
• Resumable Session Suspended
• Recovery Area Space Usage

Background processes periodically flush the data to the Automatic Workload Repository to capture a history of metric values. The alert history table and ALERT_QUE are purged automatically by the system at regular intervals.

The most convenient way to set and view threshold values is to use Enterprise Manager. To manage threshold-based alerts through Enterprise Manager:

• On the Database Home page, click on the Manage Metrics link at the bottom of the page to display the Thresholds page.
• On the Thresholds page, you can edit the threshold values.

You can view and change threshold settings for the server alert metrics using the SET_THRESHOLD and GET_THRESHOLD procedures of the DBMS_SERVER_ALERTS PL/SQL package. The DBMS_AQ and DBMS_AQADM packages provide procedures for accessing and reading alert messages in the alert queue.

Setting Threshold Levels The following example shows how to set thresholds with the SET_THRESHOLD procedure for CPU time for each user call for an instance:

In this example, a warning alert is issued when CPU time exceeds 8000 microseconds for each user call and a critical alert is issued when CPU time exceeds 10,000 microseconds for each user call. The arguments include:

• CPU_TIME_PER_CALL specifies the metric identifier. For a list of support metrics, see PL/SQL Packages and Types Reference.
• The observation period is set to 1 minute. This period specifies the number of minutes that the condition must deviate from the threshold value before thealert is issued.
• The number of consecutive occurrences is set to 2. This number specifies how many times the metric value must violate the threshold values before the alert is generated.
• The name of the instance is set to inst1.
• The constant DBMS _ALERT.OBJECT _TYPE _SERVICE specifies the object type on which the threshold is set. In this example, the service name is main.regress.rdbms.dev.us.oracle.com.

Retrieving Threshold Information To retrieve threshold values, use the GET_THRESHOLD procedure. For example:

You can also check specific threshold settings with the DBA_THRESHOLDS view. For example:

Next you must associate a database user with the subscribing agent, because only a user associated with the subscribing agent can access queued messages in the secure ALERT _QUE. You must also assign the enqueue privilege to the user. Use the ENABLE _DB _ACCESS and GRAN T _QUEUE _PRIVILEGE procedures of the DBMS _ AQADM package.

Optionally, you can register with the DBMS_AQ.REGISTER procedure to receive an asynchronous notification when an alert is enqueued to ALERT_QUE. The notification can be in the form of email, HTTP post, or PL/SQL procedure.

To read an alert message, you can use the DBMS _AQ.DEQUEUE procedure or OCIAQDeq call. After the message has been dequeued, use the DBMS _SERVER _ ALERT.EXPAND _MESSAGE procedure to expand the text of the message.

The following dictionary views provide information about server alerts:

• DBA_THRESHOLDS lists the threshold settings defined for the instance.
• DBA_OUTSTANDING_ALERTS describes the outstanding alerts in the database.
• DBA_ALERT_HISTORY lists a history of alerts that have been cleared.
• V$ALERT_TYPES provides information such as group and type for each alert. • V$METRICNAME contains the names, identifiers, and other information about the system metrics.
• V$METRIC and V$METRIC_HISTORY views contain system-level metric values in
memory.

Monitoring the Database Using Trace Files and the Alert File

Each server and background process can write to an associated trace file. When an internal error is detected by a process, it dumps information about the error to its trace file. Some of the information written to a trace file is intended for the database administrator, Other information is for Oracle Support Services. Trace file information is also used to tune applications and instances.

The alert file, or alert log, is a special trace file. The alert file of a database is a chronological log of messages and errors, and includes the following items:

• All internal errors (ORA-600), block corruption errors (ORA-1578), and deadlock errors (ORA-60) that occur
• Administrative operations, such as CREATE, ALTER, and DROP statements and
STARTUP, SHUTDOWN, and ARCHIVELOG statements
• Messages and errors relating to the functions of shared server and dispatcher processes
• Errors occurring during the automatic refresh of a materialized view
• The values of all initialization parameters that had nondefault values at the time the database and instance start

Oracle Database uses the alert file to record these operations as an alternative to displaying the information on an operator's console (although some systems also display information on the console). If an operation is successful, a "completed" message is written in the alert file, along with a timestamp.

Initialization parameters controlling the location and size of trace files are:

• BACKGROUND_DUMP_DEST
• USER_DUMP_DEST
• MAX_DUMP_FILE_SIZE

These parameters are discussed in the sections that follow.

Using the Trace Files

Check the alert file and other trace files of an instance periodically to learn whether the background processes have encountered errors. For example, when the log writer process (LGWR) cannot write to a member of a log group, an error message indicating the nature of the problem is written to the LGWR trace file and the database alert file. Such an error message means that a media or I/O problem has occurred and should be corrected immediately.

Oracle Database also writes values of initialization parameters to the alert file, in addition to other important statistics.

Specifying the Location of Trace Files

All trace files for background processes and the alert file are written to the directory specified by the initialization parameter BACKGROUND_DUMP_DEST. All trace files for server processes are written to the directory specified by the initialization parameter USER_DUMP_DEST. The names of trace files are operating system specific, but each file usually includes the name of the process writing the file (such as LGWR and RECO).

Controlling the Size of Trace Files

You can control the maximum size of all trace files (excluding the alert file) using the initialization parameter MAX_DUMP_FILE_SIZE, which limits the file to the specified number of operating system blocks. To control the size of an alert file, you must manually delete the file when you no longer need it. Otherwise the database continues to append to the file.

You can safely delete the alert file while the instance is running, although you should consider making an archived copy of it first. This archived copy could prove valuable if you should have a future problem that requires investigating the history of an instance.

Controlling When Oracle Database Writes to Trace Files

Background processes always write to a trace file when appropriate. In the case of the ARCn background process, it is possible, through an initialization parameter, to control the amount and type of trace information that is produced. Other background processes do not have this flexibility.

Trace files are written on behalf of server processes whenever internal errors occur. Additionally, setting the initialization parameter SQL_TRACE = TRUE causes the SQL trace facility to generate performance statistics for the processing of all SQL statements for an instance and write them to the USER_DUMP_DEST directory.

Optionally, you can request that trace files be generated for server processes. Regardless of the current value of the SQL_TRACE initialization parameter, each session can enable or disable trace logging on behalf of the associated server process by using the SQL statement ALTER SESSION SET SQL_TRACE. This example enables the SQL trace facility for a specific session:

ALTER SESSION SET SQL_TRACE TRUE;

Use the DBMS_SESSION or the DBMS_MONITOR package if you want to control SQL tracing for a session.

Reading the Trace File for Shared Server Sessions

If shared server is enabled, each session using a dispatcher is routed to a shared server process, and trace information is written to the server trace file only if the session has enabled tracing (or if an error is encountered). Therefore, to track tracing for a specific session that connects using a dispatcher, you might have to explore several shared server trace files. To help you, Oracle provides a command line utility program, trcsess, which consolidates all trace information pertaining to a user session in one place and orders the information by time.

Monitoring Locks

Locks are mechanisms that prevent destructive interaction between transactions accessing the same resource. The resources can be either user objects, such as tables and rows, or system objects not visible to users, such as shared data structures in memory and data dictionary rows. Oracle Database automatically obtains and manages necessary locks when executing SQL statements, so you need not be concerned with such details. However, the database also lets you lock data manually.

A deadlock can occur when two or more users are waiting for data locked by each other. Deadlocks prevent some transactions from continuing to work. Oracle Database automatically detects deadlock situations and resolves them by rolling back one of the statements involved in the deadlock, thereby releasing one set of the conflicting row locks.

Oracle Database is designed to avoid deadlocks, and they are not common. Most often they occur when transactions explicitly override the default locking of the database. Deadlocks can affect the performance of your database, so Oracle provides some scripts and views that enable you to monitor locks.

The utllockt.sql script displays, in a tree fashion, the sessions in the system that are waiting for locks and the locks that they are waiting for. The location of this script file is operating system dependent.

A second script, catblock.sql, creates the lock views that utllockt.sql needs, so you must run it before running utllockt.sql.

The following views can help you to monitor locks:

View Description table

Monitoring Wait Events

Wait events are statistics that are incremented by a server process to indicate that it had to wait for an event to complete before being able to continue processing. A session could wait for a variety of reasons, including waiting for more input, waiting for the operating system to complete a service such as a disk write, or it could wait for a lock or latch.

When a session is waiting for resources, it is not doing any useful work. A large number of waits is a source of concern. Wait event data reveals various symptoms of problems that might be affecting performance, such as latch contention, buffer contention, and I/O contention.

Oracle provides several views that display wait event statistics. A discussion of these views and their role in instance tuning is contained in Oracle Database Performance Tuning Guide.

Process and Session Views

This section lists some of the data dictionary views that you can use to monitor an Oracle Database instance. These views are general in their scope. Other views, more specific to a process, are discussed in the section of this book where the process is described.