Understanding ORACLE AWR Repor

Understanding AWR Report

Posted by Mich Talebzadeh in Oracle.
trackback

In the post Automatic Workload Repository  (AWR) Performance Monitoring Tool Basics , I described the basic set up and report generation for AWR. In this post we will try to understand the AWR report itself.
Before going further I must emphasise that this report was generated by running a PL/SQL block immediately after the instance was rebooted. The code was  used to simulate a typical OLTP workload with frequent insert/update/deletes and commits. The sample code:

  • Performs checkpoints immediately before and after PL/SQL block
  • Manually takes AWR snapshots before and after running PL/SQL block

The code is shown below

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
ALTER SYSTEM CHECKPOINT;
EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();
DECLARE
  type ObjIdArray is table of tdash.object_id%TYPE index by binary_integer;
  l_ids objIdArray;
  CURSOR c IS SELECT object_id FROM tdash;
BEGIN
  OPEN c;
  LOOP
    BEGIN
      FETCH c BULK COLLECT INTO l_ids LIMIT 100;
      FORALL rs in 1 .. l_ids.COUNT
        UPDATE testwrites
          SET PADDING1 =  RPAD('y',4000,'y')
        WHERE object_id = l_ids(rs);
      commit;
      FORALL rs in 1 .. l_ids.COUNT
        DELETE FROM testwrites
        WHERE object_id = l_ids(rs);
      commit;
      FORALL rs in 1 .. l_ids.COUNT
       INSERT INTO testwrites
       SELECT * FROM tdash t WHERE t.object_id = l_ids(rs);
      commit;
      EXIT WHEN C%NOTFOUND;
    EXCEPTION
      WHEN NO_DATA_FOUND THEN
        NULL;
      WHEN OTHERS THEN
        DBMS_OUTPUT.PUT_LINE('Transaction failed');
    END;
  END LOOP;
  CLOSE c;
END;
/
ALTER SYSTEM CHECKPOINT;
EXEC DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();

The output from the AWR report is shown below.
The snapshot details
This is normally shown at the beginning of the report.

1
2
3
4
5
6
        Snap Id      Snap Time   Sessions Curs/Sess
      --------- -------------- --------   -----
Begin Snap:4413 17-May-12 09:22:34   28   3.0
  End Snap:4414 17-May-12 09:43:35   29   3.3
   Elapsed:          21.02 (mins)
   DB Time:          21.78 (mins)

Note that unlike the hourly usual sample, this sampling took 21 minutes to complete. The snapshots in this case were taken immediately before and after running the PL/SQL block.
Elapsed (time) is just the interval of time between the start and end snapshots. Another important quantity is database DB Time which is the sum of working sessions’ time.  So DB Time =  sum of database CPU time + waits. In systems with multiple concurrent active sessions DB Time can be larger than the elapsed time. This is because DB Time is a sum over all active sessions that are using CPU(s) or waiting for an event. Note that Background processes are not included in that.
Load Profile
Next we will look at the Load Profile in AWR report

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Load Profile      Per Second    Per Transaction Per Exec Per Call
~~~~~~~~~~~~   --------------- --------------- --------- -------
DB Time(s):            1.0                0.0       0.02    0.39
DB CPU(s):             0.1                0.0       0.00    0.04
Redo size:    27,284,637.6          916,465.9
Logical reads:    46,156.0            1,550.3
Block changes:    31,214.4            1,048.5
Physical reads:    2,446.5               82.2
Physical writes:   5,627.1              189.0
User calls:            2.7                0.1
Parses:               50.8                1.7
Hard parses:           0.8                0.0
W/A MB processed:      0.1                0.0
Logons:                0.1                0.0
Executes:             62.0                2.1
Rollbacks:             0.0                0.0
Transactions:         29.8

Per Second here means that this measure correlated to 1 second. It is worth looking at Per Second statistics to ensure that the amount of work the instance is doing is reasonable for the tasks it has to perform. For example is doing nearly 30 transactions per second good enough?
To understand what Per Transaction mean, we need to look at another statistic further down the report.

1
2
3
Statistic                     Total     per Second     per Trans
----------------------- ----------- -------------- -------------
user commits                 37,550           29.8           1.0

Per transaction is just per commits. So here we had 37,550 commits. To work out the total Redo size generated, we can take the Redo size throughput per transaction at 916,465.9 bytes and multiply that with the number of commits at 37,550. That will give us the total of 914465.9 * 37550/1024/1024 = 32747.4 MB of redo generated.
Other interesting statistics are Logical reads, Physical reads and Physical writes. These are quoted in blocks. so as before the total values would be 58,213,765 Logical block reads, 3,086,610 Physical block reads and 7,096,950 Physical block writes. Now a useful statistic would be the total amount of writes occurred in MB. Since the server is 8K block then the total amount of writes is (7096950 * 8192)/(1024 * 1024)  = 55,445 MB.
Instance Efficiency Percentages
This part is not that useful. Let us have a look at it

1
2
3
4
5
6
7
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:  100.00       Redo NoWait %:  100.00
            Buffer  Hit   %:   94.70    In-memory Sort %:  100.00
            Library Hit   %:   95.49        Soft Parse %:   98.41
          Execute to Parse %:   18.06         Latch Hit %:  100.00
Parse CPU to Parse Elapsd %:   59.75     % Non-Parse CPU:   98.91

OK we have 100% targets in few places. On its own this section does not tell us much. Buffet Hit is 94.70% but this is expected as we booted the instance before the test. Then we have Library Hit at 95.49% for the same reason.
Parsing
Think of Parsing as basically a two step process, that of a Syntax Check to check the validity of the statement and that of a Semantic Check; to ensure the statement can execute properly. The next step in the parse operation is to see if the statement being parsed has already being  processed by another session.  If that is the case then we can skip the next two steps in the process, that ofoptimization and row source generation.  If we can skip these next two steps in the process, we have done what is known as a Soft Parse, a shorter process to getting our query going.  If we cannot, if we must do all of the steps, we are performing as what is known as a Hard Parse; we must parse, optimize and generate the plan for the query.  Needless to say that Hard Parsing is more CPU intensive than Soft Parsing. Note that we have 59.75% Parse CPU to Parse Elapsed. We are losing a fair bit of time while parsing. However, that should not matter as after reboot shared pool did not contain any useful compiled query and since parsing is such a small fraction of the work being done.
Top 5 Timed Foreground Events
This is probably one of the most important section of AWR report. In a nutshell it is all about wait events. It records the number of waits encountered in the period and the total time spent waiting together with the average time waited for each event. The section is ordered by the percentage  of the total database time that each Event is responsible for.

1
2
3
4
5
6
7
8
9
10
11
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                          Avg
                                         wait   % DB
Event                   Waits   Time(s)  (ms)   time Wait Class
---------------------------- ----------- ------ ------ ----------
db file sequential read 2,817,676    810   0   62.0 User I/O
DB CPU                               130        9.9
free buffer waits           5,505     70  13    5.3 Configurat
log file sync                 142     46 324    3.5 Commit
log buffer space              371    36   97    2.7 Configurat

Dependent on what is seen in this section, other report sections may need to be referenced in order to quantify or check the findings. For example, the wait count for a particular event needs to be assessed based upon the duration of the reporting period and also the number of active sessions at the time. In other words do stats look OK compared to the reported workload?
In above 62% of database time was spent waiting for db file sequential read, which in this case was about single block index reads. However, the average wait time was zero. Another 9.9% of the time was spent waiting for or using CPU time. A tenth of database time is not bad at all. In general High CPU usage is often a symptom of poorly tuned SQL (excessive logical I/O due to suboptimal search path or excessive physical I/O).
Further down we have 5.3% of database time spent on free buffer waits. Free buffer wait occurs when a user session reads a block from the disk and cannot find a free block in the buffer cache to place it in. One frequently attributed reason is that the DBWR process is not clearing the buffer cache fast enough. However, there can be many reasons and one generic answer will not help. You will need to investiggate the cause or causes of this problem.
The log file sync wait event reports 3.5% of database time. It is triggered when a user session issues a commit (or a rollback). The user session will signal or post the LGWR to write the log buffer to the redo log file. If you look at the sql code above we are issuing frequent commits after processing every 100 rows. Finally, we have Log Buffer Space wait event taking 2.7% of DB time. This event occurs when server processes write data into the log buffer faster than the LGWR process can write it out. Refer to Oracle documentation if you see high readings for this wait event.
The time model statistics
Time model statistics tell you the processing time spent on various matrics during the snapshot interval. You should not expect the % of DB Time to add up to 100% because there is overlap among statistics. For example sql execute elapsed time requires CPU time for sql execution. Note that some statistics such as background elapsed time and background cpu time are shown as well but these are not part of  % of DB Time.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Time Model Statistics                    DB/Inst: MYDB/mydb  Snaps: 4413-4414
-> Total time in database user-calls (DB Time): 1306.8s
-> Statistics including the word "background" measure background process
time, and so do not contribute to the DB time statistic
-> Ordered by % or DB time desc, Statistic name
Statistic Name                          Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time               1,247.7         95.5
DB CPU                                   129.9          9.9
connection management call elapsed time    5.5           .4
parse time elapsed                         4.2           .3
hard parse elapsed time                    3.7           .3
PL/SQL execution elapsed time              1.2           .1
PL/SQL compilation elapsed time            0.4           .0
hard parse (sharing criteria) elapsed time 0.1           .0
repeated bind elapsed time                 0.0           .0
hard parse (bind mismatch) elapsed time    0.0           .0
sequence load elapsed time                 0.0           .0
failed parse elapsed time                  0.0           .0
DB time                                1,306.8
background elapsed time                3,395.9
background cpu time                       19.6
-------------------------------------------------------------

We can go through these figures now. 129.9 seconds CPU time was used for all user sessions. This was just under 10% of database resources. In total there was 1,306.8 seconds database time used. The total wait event time can be calculated as 1,306.8 – 129.9 = 1,176.9 seconds. The lion share of database time (95.5%) was spent on executing sql which is a good sign. The toal parse time was 5.5 dseconds of which 4.2 seconds was hard parsing. The rest of statistics is tiny in this case
Foreground Wait Class
We are familiar with Oracle wait events. In AWR report under Wait Class, wait events (to be precise user generated wait events AKA Foreground wait events) are grouped into classes. These include User I/O, DB CPU, Configuration, Commit, Other, System I/O, Application, Network and Concurrency. For example, exclusive TX locks are generally an application level issue and HW locks are generally a configuration issue. The following list includes common examples of the waits in some of the classes:

  • User I/O: Waits for user IO (db file sequential read, db file scattered read, direct path read, direct path write etc)
  • DB CPU: See above for the definition of this class
  • Configuration: Waits caused by inadequate configuration of database or instance resources (for example, undersized log file sizes, shared pool size etc)
  • Commit: waits for redo log write confirmation after a commit
  • Other: Waits which should not typically occur on a system (for example, ‘wait for EMON to spawn’)
  • System I/O: Waits for background process IO
  • Application: Waits resulting from user application code (for example, lock waits caused by row level locking or explicit lock commands)
  • Network: waits for data to be sent over the network
  • Concurrency: Waits for internal database resources (for example, latches)

For other classes  not covered above,  you can get info from Oracle manuals.
Let us look at this section in our AWR report:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
                                                 Avg
                             %Time   Total Wait  wait
Wait Class      Waits        -outs   Time (s)    ms)  %DB time
----------- ---------------- ----- ---------------- -------- -----
User I/O          2,866,247     0     838        0      64.2
DB CPU                                130                9.9
Configuration         5,930     0     110       18       8.4
Commit                  142     0      46      324       3.5
Other                 1,971    96      20       10       1.6
System I/O              417     0       0        0       0.0
Application              88     0       0        0       0.0
Network               2,653     0       0        0       0.0
Concurrency               5     0       0        0       0.0
-------------------------------------------------------------

User I/O taking 64.2% of database time was explained before. The CPU usage was just under 9.9%. Configuration contributed 8.4% of wait time. These are attributed to database settings. We will see the events contributing to this class below.
Foreground Wait Events

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-> s  - second, ms - millisecond -    1000th of a second
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by wait time desc, waits desc (idle events last)
-> %Timeouts: value of 0 indicates value was < .5%.  Value of null is truly 0
                                                 Avg
                              %Time Total Wait  wait  Waits   % DB
Event                Waits    -outs   Time (s)  (ms)   /txn   time
-----------------  ------------ ---------- ------- -------- ------
db file sequential r 2,817,676    0   810       0     75.0   62.0
free buffer waits        5,505    0    70      13      0.1    5.3
log file sync              142    0    46     324      0.0    3.5
log buffer space           371    0    36      97      0.0    2.7
db file scattered r     48,458    0    28       1      1.3    2.1
rdbms ipc reply             10   90    19    1949      0.0    1.5
log file switch             48    0     4      81      0.0     .3
------------------------------------------------------------

For the sake of brevity I have only included those wait events that contribute to % DB timedb file sequential read and db file scattered read belong to user I/Owait class. log file sync is part of Commit wait class. free buffer waitslog buffer space, rdbms ipc reply and log file switch wait events belong to Configurationwait class.
Background Wait Events
This section of AWR report provides statistics on wait events for Oracle background processes.

1
2
3
4
5
6
7
8
9
10
11
12
13
Background Wait Events
-> ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was < .5%.  Value of null is truly 0
                                                 Avg
                              %Time Total Wait  wait  Waits   % bg
Event                Waits    -outs   Time (s)  (ms)   /txn   time
-----------------  ------------ ---------- ------- -------- ------
log file parallel write 10,319    0   845      82      0.3   24.9
db file parallel write  43,425    0   207       5      1.2    6.1
direct path write           55    0     2      40      0.0     .1
control file seq re     2,501     0     2       1      0.1     .1
-------------------------------------------------------------

The log file parallel write shows LGWR is waiting for blocks to be written to all online redo log members in one group. LGWR will wait until all blocks have been written to all members.   So here we had 24.9% of backhround total time spent on log file parallel write. Thedb file parallel write wait event belongs to DBWR process since it is the only process that writes dirty blocks from the SGA to datafiles.  DBWR process compiles a set of dirty blocks, hands the batch over to the OS, and waits on the db file parallel write event for the I/O to complete. The parameter of interest here is Avg wait (ms). In our case it is 5ms which is a perfectably respetable figure. Obviously larger average wait times point to slower I/O subsystem or poor I/O configurations.
Service Level Statistics
The service statistics section gives information about how particular services configured in the database are performing.

  • A service is a grouping of processes
  • Users may be grouped in SYS$USERS
  • A single user (application login) may be grouped with that user name
1
2
3
4
5
6
7
8
9
10
Service Statistics
-> ordered by DB Time
                                            Physical      Logical
 Service Name   DB Time (s)   DB CPU (s)    Reads (K)    Reads (K)
------------- ------------ ------------ ------------ ------------
SYS$USERS            1,281          129        3,079       58,172
mydb.mich.local         25            1            2           34
SYS$BACKGROUND           0            0            5           14
mydbXDB                  0            0            0            0
    -------------------------------------------------------------

Note that SYS$USERS service is a default service for all applications. You can configure different services for different production applications. This will enable DBA to identify the heavy loading services on the system. by grouping services according to the production need. For example services created for batch applications etc.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Service Wait Class Stats
-> Wait Class info for services in the Service Statistics section.
-> Total Waits and Time Waited displayed for the following wait
   classes:  User I/O, Concurrency, Administrative, Network
-> Time Waited (Wt Time) in seconds
Service Name
 ----------------------------------------------------------------
User I/O  User I/O  Concurcy Concurcy Admin  Admin Network Network
Tot Wts Wt Time   Tot Wts Wt Time Total Wts Wt Time Tot Wts Wt Time
--------- ------- -------- -------- --------- --------- ------- -------
SYS$USERS
2864754     838      5         0         0         0     796     0
mydb.mich.local
1493          0      0         0         0         0    1857     0
SYS$BACKGROUND
761          4      48         1         0         0       0     0
-------------------------------------------------------------

SQL Section
Any SQL statement appears in the top 5 statements in two or more areas below, then it is a prime candidate for tuning. The sections are:

  • Total Elapsed Time
  • Total CPU Time
  • Total Buffer Gets
  • Total Disk Reads
  • Total Executions
  • Total Parse Calls
  • Total Sharable Memory
  • Total Version Count

Let us try to see what these mean.
Total Elapsed Time
Total Elapsed Time = CPU Time + Wait Time. If a SQL statement appears in the total elapsed time area of the report this means its CPU time plus any other wait times made it pop to the top of the pile. Excessive Elapsed Time could be due to excessive CPU usage or excessive wait times.
This is the area that you need to examine and probably the one that will be reported by the users or application support. From a consumer perspective, the finer details don’t matter. The application is slow. Full stop.
In conjunction with excessive Elapsed time check to see if this piece of SQL is also a high consumer under Total CPU Time. It is normally the case. Otherwise check the wait times and Total Disk Reads. They can either indicate issues with wait times (slow disks, latch gets etc) or too much Physical IO associated with tables scans or sub-optimal indexes.  This section is a gate opener and often you will need to examine other sections.
Total CPU Time
When a statement appears in the Total CPU Time area this indicates it used excessive CPU cycles during its processing. Excessive CPU processing time can be caused by sorting, excessive function usage or long parse times. Indicators that you should be looking at this section for SQL tuning candidates include high CPU percentages in the service section for the service associated with this SQL (a hint, if the SQL is uppercase it probably comes from a user or application; if it is lowercase it usually comes from the internal or background processes). To reduce total CPU time, reduce sorting by using composite indexes that can cover sorting and use bind variables to reduce parse times.
Total Buffer Gets
Total buffer gets mean a SQL statement is reading a lot of data from the db block buffers. Generally speaking buffer gets (AKA logical IO or LIO) are OK, except when they become excessive. The old saying that you reduce the logical IO, because then the physical IO (disk read) will take care of itself holds true. LIO may  have incurred a PIO in order to get the block into the buffer in the first place. Reducing buffer gets is very important and should not be underestimated. To get a block from db block buffers, we have to latch it (i.e. in order to prevent someone from modifying the data structures we are currently reading from the buffer). Although latches are less persistent than locks, a latch is still a serialization device. Serialization devices inhibit scalability, the more you use them, the less concurrency you get. Therefore in most cases optimal buffer gets can result in improved performance. Also note that by lowering buffer gets you will require less CPU usage and less latching. |Thus to reduce excessive buffer gets, optimise SQL to use appropriate indexes and reduce full table scans. You can also look at improving the indexing strategy and consider deploying partitioning (licensed).
Total Disk Reads
High total disk reads mean a SQL statement is reading a lot of data from disks rather than being able to access that data from the db block buffers. High physical reads after a server reboot are expected as the cache is cold and data is fetched from the disk. However, disk reads (or physical reads) are undesirable in an OLTP system, especially when they become excessive. Excessive disk reads do cause performance issues. The usual norm is to increase the db buffer cache to allow more buffers and reduce ageing . Total disk reads are typified by high physical reads, a low buffer cache hit ratio, with high IO wait times. Higher wait times for Disk IO can be associated with a variety of reasons (busy or over saturated SAN, slower underlying storage, low capacity in HBC and other hardware causes). Statistics on IO section in AWR, plus the Oerating System diagnostic tools as simple as iostatcan help in identifying these issues. To reduce excessive disk reads, consider partitioning, use indexes and look at optimizing SQL to avoid excessive full table scans.
Total Executions
High total executions need to be reviewed to see if they are genuine executions or loops in SQL code. I have also seen situations where autosys jobs fire duplicate codes erroneously. In general statements with high numbers of executions usually are being properly reused. However, there is always a chance of unnecessary loop in PL/SQL, Java or C#. Statements with high number of executions, high number of logical and or physical reads are candidates for review to be sure they are not being executed multiple times when a single execution would serve. If the database has excessive physical and logical reads or excessive IO wait times, then look at the SQL statements that show excessive executions and show high physical and logical reads.
Parse Calls
Whenever a statement is issued by a user or process, regardless of whether it is in the SQL pool it undergoes a parse.  As explained under Parsing, the parse can be a hard parse or a soft parse. Excessive parse calls usually go with excessive executions. If the statement is using what are known as unsafe bind variables then the statement will be reparsed each time. If the header parse ratios are low look here and in the version count areas.
Sharable Memory
Sharable Memory refers to Shared Pool memory area in SGA , hence this perticular section in AWR Report states about the SQL STATEMENT CURSORS which consumed the maximum amount of the Shared Pool for their execution.
In general high values for Sharable Memory doesn’t necessary imply there is an issue It simply means that

  1. These SQL statements are big or complex and Oracle has to keep lots of information about these statements OR
  2. big number of child cursors exist for those parent cursors
  3. combination of 1 & 2

In case of point 2, it may be due to poor coding such as bind variables mismatch, security mismatch  or overly large SQL statements that join many tables. In a DSS or  DW environment large complex statements are normal. In an OLTP database large or complex statements are usually the result of over-normalization of the database design, attempts to use an OLTP system as a DW or simply poor coding techniques. Usually large statements will result in excessive parsing, recursion, and large CPU usage.
Version Count
High version counts are usually due to multiple identical-schema databases, unsafe bind variables, or Oracle bugs.
Instance Activity Statistics
This section compares the statistic values of instance activity between the two snapshot sets. For each statistic, the value of the statistic is shown along with the differentials measured by DB time, elapsed time, and per transaction. You have a comprehensive set of statistics here. Use these stats with the other stats
Tablespace IO Stats
This is a useful section in that it shows the tablespace IO activities. Remember the stats here are at the tablespace level.

1
2
3
4
5
6
7
8
9
10
11
12
13
Tablespace IO Stats
 -> ordered by IOs (Reads + Writes) desc
Tablespace ------------------------------
               Av       Av     Av               Av     Buf  Av Buf
    Reads     Reads/s  Rd(ms) Blks/Rd  Writes Writes/s Waits Wt(ms)
  ---------- ------- ------- -------  --------- ---- -------------
APP1 2,860,666  2,268   0.3     1.1  3,564,644   2,826   0    0.0
UNDOTBS1    90      0   9.8     1.0  1,990,759   1,578   0    0.0
SYSTEM   7,190      6   0.2     1.3        204       0   0    0.0
SYSAUX   1,938      2   0.3     2.1      1,047       1   0    0.0
APP1_HDD   105      0   0.1     1.0        105       0   0    0.0
-------------------------------------------------------------

These stats are very useful in identifying the throughput for each tablespace. For example, for tablespace APP1 we see in total 2,860,666 blocks read in. The throughput for read is 2,268 blocks per second and each read on average fetched 1.1 blocks. They were 3,564,644 block writes in total with a throughput of 2,826 blocks per second.Obviously those throughputs will tell if there has been any sub-optimal performance compared to previous records. If that is the case then you will need to examine the underlying disk sub-systems for the tablespace.
Taken from : http://talebzadehmich.wordpress.com/2012/06/13/understanding-awr-report/

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *