Analyzing Possible Performance Issues SAP BASIS

When you are observing the activity of a system via SM50 or SM66 and notice that the time accumulated by an occupied (running) work process starts to be excessive, watch the action that is being taken.

Possible Database Problems

Actions like Direct Read, Insert, and Delete are related to the database. A Sequential Read denotes a read of more than one row (it may—or may not—be a whole table). Many times, when the accumulated time is very high, this indicates that a problem is located in the database. To analyze more in depth the database, go to the Database Monitor by executing transaction ST04 or via the menu path Tools | Administration | Monitor | Performance | Database | Activity. A good number of times, expensive SQL statements will be the root cause of these long times.

On other occasions, a Waiting for DB Lock action may indicate that a database lock is occurring and the way to analyze this is via transaction DB01 or following the menu path Tools | Administration | Monitor | Performance | Database | Exclusive Lock Waits. Please refer to the discussion on database performance later on in this chapter for a better understanding of the statistics shown by ST04 and DB01 and how to analyze and solve problems in the database.

Possible Memory Configuration Problems

Actions like Roll In or Roll Out or Load Program taking a long time may indicate that certain memory areas are not configured properly. It is very likely that the SAP R/3 buffers are not large enough. Also, when a dialog work process is in status stopped and the reason, denoted is "PRIV," that indicates that this work process is in "private mode," locked until the transaction that it's executing is finished and an excessive amount of memory is being utilized. It is very likely that the memory areas configured for that instance are not large enough. To analyze more in depth the memory configuration, execute transaction ST02 or follow the menu path Tools | Administration | Monitor | Performance | Setup/Buffers | Buffers. Again, we will evaluate the statistics and how to detect and solve performance issues related to memory configuration later on in more detail.

Possible Communication Problems

When a work process is in status stopped, with the reason "CPIC," this may be due to a wrong or slow communication between servers or systems. Find out if there are network problems preventing a correct communication or if all the work processes in the other system are occupied and therefore the work processes in this instance are waiting for a communication from the other system.

Insufficient Work Processes

All work processes are used in a sequential mode. That means that when the dispatcher assigns requests to work processes, these will always be assigned in a sequential order. If the first work process of that type is occupied, the dispatcher will assign the request to the second one and like that subsequently. For example, if there are 10 dialog work processes in an instance and they go from number 0 to number 9 , the dispatcher will always assign a request to the first dialog work process, if this is free. If this is occupied, it will assign the request to dialog work process number 2, and like this.

This means that when you check the CPU time spent by each work process (as explained above), these times should be sequentially decreasing. If you observe that the last one or two work processes have high statistics, this means that all of the configured dialog work processes have been used at some point in time. Therefore, it is likely that users might have been waiting for a free resource (a free dialog work process in this case) and wait times (consequently, response times) might have been affected negatively.

You should configure your instances in a way that this does not occur and you have sufficient work processes to handle the workload. Be aware that the number of work processes that you can configure is not infinite and it depends on your hardware resources (mainly physical memory and CPU) of that server.

The Workload Analysis Monitor

The Workload Analysis Monitor helps you to analyze statistics in order to check the health of a SAP system via the observation of statistics that are collected and kept in the system. You can analyze general problems, as well as specific issues in business transactions.

Collecting Data in the Workload Analysis Monitor

Certain collectors must run on a regular basis to obtain these statistics and make them available. The ABAP program RSCOLL00 collects performance statistical data and stores these data in table MONI. In every SAP R/3 system the system administrator must submit a background job that executes the RSCOLLOO program hourly. This job is usually called SAP_COLLECTOR_FOR_PERFMONITOR and must be submitted by user DDIC in client 000 to run periodically every hour. This program uses the special table TCOLL (data collector configuration table), which includes the list of specific collector programs to be executed and the running dates and times for each of those programs.

This is a standard SAP table and, although modifications are possible, make them only in accordance to SAP guidance. To access and maintain this table from the Workload Analysis Monitor (which can be accessed using transaction code ST03N), on the left-hand side of the screen go to section Collector | Performance Monitor Collector and choose Execution Times. You can decide how much data and how many dates should be kept in table MONI. To define retention times , go to Collector | Performance Database |Reorganization. These settings might have some impact on the overall system performance, so you should proceed with caution when changing them.

Retention times for collected data

Retention times for collected data

Definition of Response Time and Statistics in the Workload Analysis Monitor

The response time of a particular transaction is divided into several components and the Workload Analysis Monitor keeps track of all these statistics at a very detailed level. Response Time comprises the time from which a user clicks the Execute button in a transaction or enters data and presses ENTER until the system gets back to the user with the necessary data to display. Basically, the time that the user is seeing the hourglass on the screen while a request is being processed. This time is divided into the following components:

  • Wait time. Time spent waiting for a free work process. A user needs a work process to execute a request. The user will wait until a work process is free to perform the task that needs to be done. For example, a dialog work process can be used to read data from a table and display a list. An update work process can be used to send an update request to the database and change a table. The dispatcher maintains a queue (the dispatcher queue) where all requests are put in place waiting for a free work process to perform that task. Obviously, this wait time should be minimal. Otherwise, this would indicate a performance problem.
  • Roll-in time and roll-out time. When a work process is assigned to a user to perform a certain task, the work process must first roll in the user's user context. The user context comprises internal tables, variables, etc that are stored in the memory of the application server. When a work process has finished the task that it was supposed to do, it releases the user context (roll out). The amount of time that is spent rolling in the user context is accrued as roll-in time and the amount of time spent rolling out the user context is roll-out time. Because the task is finished before the roll-out process and the data are sent to the user before that fact, the roll-out time is not part of the response time of that user. However, it could influence negatively the waittime of a following user that may be waiting for a free work process.
  • Load time. Time spent loading all ABAP programs, screens, and so on that the user needs to execute in the SAP buffers, such as the Program Buffer and the Screen Buffer.
  • Database time. Amount of time spent since a request is sent from the database interface of the work process to obtain the data that is requested by the user (display data) or to send an update request (change). In a three-tier architecture, where the database server is in a separate server, the time spent in the network (network time) is included in the database time. The requests sent to the database are sent first to the SAP buffers to try to obtain the requested data. Accesses to a buffer are on average 100 times faster than to the database. Since release 4.6C of SAP R/3 it is also possible to gather statistics on DB procedure calls, by means of the Workload Monitor in Expert Mode view.
  • Enqueue time. This is the time spent during a work process sending an enqueue request.
  • CPU time. To perform all the tasks indicated above (roll in, roll out, load programs, send requests to the database, etc.), the work process occupies the CPU for a certain period of time (cycles) and this time is measured and averaged with this statistic.
  • Processing time. This is the time left of subtracting the wait time plus the load time plus the database time and the enqueue time off the total response time.
  • GUI time. Time spent in the front-end server processing screens and results from the application server. It includes part of the time spent in the network between the application and front-end servers, since one dialog step may perform several "trips" sending and receiving packets of data between the application server and the front-end server. The Net Time is the amount of time spent in the first and last transfer of data between the application and the front-end servers.

All of these components of the response time of any transaction are recorded and kept in the system for a certain period of time. The Workload Analysis Monitor accesses these data to evaluate the performance and the history of response times of a system. To access the Workload Analysis Monitor, execute transaction ST03N or follow the menu path Tools | Administration | Monitor | Performance | Workload | Analysis.

The Workload Analysis Monitor

The Workload Analysis Monitor

If you want to see aggregated statistics records from a global point of view, you can also use transaction ST03G, which shows the Global System Workload Analysis. Note that this is a new monitor replacing the old ST03 transaction. In fact, in any SAP system based on SAP Web Application Server, by default, if you execute transaction ST03, it will default into ST03N. The Workload Analysis Monitor is now divided in two sections. On the left hand side you choose the function you want to execute and the analysis view you may want to analyze. You are able to save the view that you last used by clicking on Save View and the next time you call this monitor, it will display the analysis view you chose, saving you time of navigating through screens. There are three possible views: Expert Mode, Administrator or Service Engineer, which give you more or less detailed data.

You are able to analyze individual servers or a system in its whole by choosing Total or a particular server in the left hand side and the statistics will be displayed in the right-hand side. As displayed in Figure, the workload overview shows the average response time by all task types (or work process type) and a breakdown of the components of the response time in which a particular transaction spends its time. The explanation of the meaning of all these components was provided in the previous section. The dialog response lime is especially important to monitor, because it shows the performance experience of the user.

All the numbers are averaged by dialog steps and the statistical information that is displayed depends on the period of time that you choose. By default, from the start of the day until the moment you execute ST03N. It is possible to analyze a shorter or longer period of time by choosing a day, a week, or a month from the left side (note that this function is only available in the Expert Mode view). Or you can specify intervals of minutes or hours in the Last Minute Load in the Detailed Analysis from the left-hand side of the screen for a more specific analysis in time.

Analyzing the Data Provided by the Workload Analysis Monitor

This monitor provides you with several tabs with views of different breakdowns.

Breakdown of the components of the response time

Breakdown of the components of the response time

The tab named Database specifies statistics about the database accesses and the average response time. For example, you are able to see in average how long a dialog task took to perform a sequential read. High average times here indicate possible problems in the database. The Roll Information tab corresponds to the number of roll-ins and roll-outs and the average time taken by a work process to perform this task. High average limes here indicale possible problems with memory configuration.

The Paris of Response Time tab provides you with a breakdown in percentage of the components of the response time, making it very easy to identify where performance problems are located in a system. T'here is not a set or specific "good" response time, because it depends on many factors, such as the resources, but also the type of applications executed in that system, the level of customization and modifications, and so on. However, there are guidelines and rules of thumbs to follow to determine whether these statistics reveal a performance problem or not. Not all work processes may be measured by the same thresholds, but the following are guidelines to detect possible performance problems in dialog processing, therefore, observing the statistics of the dialog work process in the Workload Analysis Monitor:

  • The wait time should fall under 10% of the total response time. Otherwise, it may represent a performance problem from several possible causes. One cause may be insufficient work processes.
  • The load time should not be greater than 50 milliseconds and the roll times should not exceed 20 milliseconds. Otherwise, this may be due to problems with memory configuration, small buffer sizes, wrong parameter settings or a CPU bottleneck.
  • The enqueue time should be smaller than 5 milliseconds. Normally, there are no performance problems related to this service, but if these statistics were high, it would represent an important problem that might affect system stability as well. The network may also affect negatively, if there is a problem there.
  • Processing time should be below twice the CPU time. High processing time may indicate that the ABAP programs are very complex and the work process spends a large amount of time "interpreting" what is to be done.
  • Database time should be under 40% of the total response time. Many areas can affect negatively this time, such as problems in the database, like expensive SQL statements, but also wrong parameter settings at the database level. In addition, the network may also affect negatively this time and if a contention problem is located in the physical disks (Input/Output problems), this time may also be affected and therefore it may increase considerably the total response time.
  • The GUI time and the net time should not be greater than 100 milliseconds. The hardware configuration in the presentation server, as well as the network, influences this time considerably. If these times are high, the perception of the user would be that the system is not performing well. However, the system may be responding well and there are problems in the network or in the presentation servers.

Response Time in External Communications

More and more communications with other systems, like a SAP CRM, a SAP BW, SAP APO, or other applications or external systems are taking a higher importance within your SAP Solution. That is why we are going to spend some time analyzing the performance of RFC communications and troubleshooting them in this topic. Remote Function Calls are used to communicate to other systems and applications and that is why RFC Time is also measured in the Workload Analysis Monitor. In order to analyze whether the response time of an external communication to another system is within your expectations, first observe the Roll Wait Time stored in transaction ST033N.

Roll Wait Time as indicator of response time for external communications

Roll Wait Time as indicator of response time for external communications

An RFC communication within two SAP systems is handled by two dialog work processes, one in each system. The dialog work process executing the RFC call has rolled in the user context of the user performing that request. Because the request involves an RFC call to another system, the dialog work process rolls out the user context and a dialog work process in the other system will roll it in again. This is necessary, among other criteria, for security purposes. You would not allow a nonauthorized user to log on another system to perform a task and the dialog work process can check the authorizations in the user context and verify that the user executing that call is authorized to perform such task in the target system.

The "wait" time between the time one dialog work process rolls out the user context of the user and the other one rolls it in is Roll Wait Time. The higher this is, the longer the response time that the original user will experience.
Possible causes of high Roll Wait Times may be due to having all work processes in the target system occupied. It is very important to configure the instances properly, especially when they must be designed to handle RFC communications.

Always ensure that you have sufficient work processes in each instance to handle both online and RFC workload, without forgetting that the number of dialog work processes that you can configure is not infinite and it depends on the hardware resources of the application server. In addition, there are certain parameter settings that you can configure in order to obtain optimal response time and balance the RFC workload properly. The following are the most critical parameters to configure:

  • rdisp/rfc_max_comm_entries specifies the maximum number of communications in an instance. No more dialog work processes will be given to the program calling the target system after this number is reached. It is a percentage and by default the value is 90 (%).
  • rdisp/rfc_min_wait_dia_wp tells the number of dialog work processes to be always available for online users. By default it is 1.

Therefore, playing around with the above parameters and some others (parameter settings for high interface load are specified in SAP Note 74141), you can achieve optimal balancing of resources and avoid high response times in RFC communications.

Troubleshooting RFC Communications

Sometimes the RFC Times are high but the Roll Wait Times are not, and therefore you wonder where the time for that RFC communication is spent. The best way to analyze a specific transaction is to use the statistical records stored in your SAP system. Execute transaction STAD or from the Workload Analysis Monitor, select the Expert Mode view by clicking in the list box at the top left side of the screen, and then follow the path Detailed Analysis | Business Transaction Analysis. You may filter by user, by business transaction, program, specify a time period, group the results, and summarize all the dialog steps of one transaction or show all of them individually.

Statistical records in a SAP system

Statistical records in a SAP system

Note that this transaction substitutes transaction STAT in the newer SAP releases with the underlying Web Application Server. Once you have selected the statistical records according to your chosen criteria, the resulting list will provide you with all the details of the response time of each and every transaction (that corresponds to your selection criteria). Double-click on a line that contains an RFC transaction, which will be denoted by "RFC" under column Program and by "R" under column T (for Task Type).

The screen shown in Figure will provide you with the details of the time spent in that particular work process executing that dialog step, which in this case it also performed an RFC call to an external target. This is the granularity of these statistics! Click on the RFC icon to display the RFC call and the details of such communication.

Details of the response time of an RFC communication

Statistical records in a SAP system

Click on Server (the cursor will transform into a little hand) to get to the response time details.The remote execution time displayed is the actual time spent in the server that processed the RFC call. The calling time is the remote execution time in addition to the network time and the Roll Wait Time. Therefore, if you have eliminated the Roll Wait Time as a variable in your analysis for a possible performance problem, because you saw in the Workload Analysis Monitor that it was low and the remote execution time is also low compared to the total calling time, this means that the RFC call spends quite some time in the network.

It is very likely that somewhere in the network (and that would include the complete infrastructure, network, routers, etc.), there is a problem that is affecting negatively the communication between servers and affecting the performance of processes that need external communication.

Troubleshooting the Presentation Server Response Time

In SAP R/3 Enterprise, but starting with a previous releases (4.6B and 4.6C), the SAP Graphical User Interface (SAP GUI) was redesigned for a more user-friendly appearance, which you can personalize, and it is able to incorporate more complex elements, such as Web applications. This represents advantages, such as scrolling through the screen and accessing full menus that are downloaded to the front-end server, instead of representing a dialog step, as in the old SAP GUI. However, the implications in performance are greater and the network traffic increases, as well as the hardware requirements for the presentation server.

In addition, with the new SAP NetWeaver components, such as Enterprise Portals, accessing the back-end systems requires more powerful network bandwidth and front-end servers. You may analyze the GUI Time and the Net Time the same way you did before with the RFC Time, using the same tools, such as the Workload Analysis Monitor (ST03N, observing the GUI Time) and STAD, observing the GUI Time and the Net Time. If these times are excessive, check that the hardware requirements for the presentation server are met and that the network between the application servers and the presentation servers is not experiencing shortages or slow traffic.

In order to check the network between the application servers and the front-end servers, from a SAP system you may execute transaction OS01 or follow the path Tools | Administration | Monitor | Performance | Operating System | Network LAN Check with Ping. In addition, a different way to go is by executing ST06 or Detail Analysis Menu | LAN Check by Ping. Double-click on Presentation Servers. You may select as many as you want with the Pick option and finally click on 10 x Ping. This will send several small data packets to the chosen presentation servers and verify that they respond or not and how long does it take for the packets to go and come back.

Depending on the infrastructure used and the network bandwidth available, these times may vary and you must take this into consideration to determine if the response times shown represent a problem in your system or not. The following are rules of thumb:

  • In a local area network (LAN) the network time is expected to be below 20 milliseconds.
  • In a wide area network (WAN) (for example, 256 or 384 KBit/sec), it is expected to be below 50 milliseconds.
  • In a WAN (for example, 128 KBit/sec or less) it is expected to be below 250 milliseconds.
  • Finally, losses of data packets should not occur at all.

As you can observe, an example of each possible occurrence. One presentation server experienced an "impossible link" or a loss of 100% of the packets sent. This means that something may be wrong with the network between the application server sending the data and that application server. All elements in a network, including routers, influence and should be checked by your network administrator with network monitoring tools.

Checking the network between application and presentation servers

Checking the network between application and presentation servers

Another presentation server is experiencing high response times and the same advise applies to verify possible causes of slow traffic between the servers. It is important to note that these presentations servers may be located in two different geographical locations and not in the same LAN, which is why the results may be so despair from one to the other.

Analyzing Specific Business Transactions

There are several ways to check and analyze the statistics that are kept for specific business transactions from the Workload Analysis Monitor. As explained previously, you may execute STAD and check each individual business transaction. In addition, you may also choose a different view in the Workload Analysis Monitor. While in Administrator or Service Engineer view, in the left side of the screen within the Workload Analysis Monitor, choose Transaction Profile.

That view provides you with the top 15 business transactions (which can be customizable) that have been executed and the response times with a breakdown by component and that way you can analyze where most of the time is spent when executing that transaction.

Checking the response time of specific transactions with the Transaction Profile

Checking the response time of specific transactions with the Transaction Profile

This way you can check if there are specific transactions that are experiencing performance. It is important to understand that the system may be performing well overall, but only a few users that execute certain transactions are affected and are not, therefore, a specific problem that you can isolate and analyze in particular. Sometimes it is not easy to identify the business area to which a transaction belongs to and there is a useful trick for system administrators to check what area a transaction or program belongs to. Execute program RSSTATUS in SE38 and enter the name of the program or transaction you want and press ENTER. The output will show you what area that program or transaction belongs to (for example, SD-Sales).

Troubleshooting Specific Performance Problems with the SQL Trace

There are certain tools that help you to analyze the performance of a particular program or transaction. One tool or another should be used, depending on what you observe in the Transaction Profile. If the statistics for that particular transaction or program are showing high database times, a SQL Trace can be useful to check for expensive SQL Statements caused by that transaction. A SQL Trace is started with transaction ST05 or Tools | Administration | Monitor | Traces | Performance Trace. Alternatively you can also use the new transaction code STKONTEXTTRACE.

Only one trace per Instance can be executed at a time. As a best practice, open two screens, one with the program or transaction that you have identified as performance impaired and another one with the SQL Trace ST05. If you are not tracing yourself, but another user, have the user work in only one screen executing that program or transaction and nothing else. Otherwise, the trace would not be useful, because everything that the user did would be traced.

Activate the trace (for yourself or using Activate with Filter selecting the user you want to trace). Run the program or transaction or have the user do it. When it is finished, deactivate the trace and click on Display Trace to observe the results. A filter can be used to display more or less data. For example, it is useful to display an Extended Trace list, which includes the program that passes the recorded SQL Statements to the database. Observe in Figure the results of an example of a SQL Trace. The records displayed are accesses to the database performed while executing the traced program. There are two important observations to make here. One is that if you run this program again and trace it, it is likely that several records will not show up in the trace results. Why? Because probably some tables will have been buffered into the R/3 buffers the first time you ran the program. Two is that only accesses to the database are recorded by this trace.

Example of a SQL Trace

Example of a SQL Trace

The Duration column specifies how long that particular SQL statement took to be read. If this time exceeds 150 milliseconds (note that the time is actually recorded in microseconds, therefore you would consider 150,000 microseconds as threshold!), such duration will be highlighted in red color, stating that that particular statement is considered expensive. You do not need to follow this threshold completely as written in stone, since many factors influence the access to the data in a database and therefore, depending on your hardware resources, the type of data that we are accessing, the disk devices, the network (if the database server is separate from the application server where you started the trace), and many other factors, this threshold may vary.

Once you have identified that a particular statement is expensive, your goal is to find out why, evaluating the access path that the database engine chose to access that data. Then eliminate the reason why that access path was not suitable and therefore eliminate what can be a very important performance problem. Sometimes, a secondary index in the table affected may help. Other times, buffer the table may help as well. Quite a few times, the selection criteria from the screen has not been very well defined by the user and this is requesting a lot of data unnecessarily. Finally, it may be necessary to change the code (especially in custom-developed programs) to provide with a more suitable way to access the data, avoiding overload onto the database.

Think that by identifying these performance issues in particular transactions or programs that are executed by the user community, you may be able to make the day of many people and ease their job quite a lot.

Troubleshooting Specific Performance Problems with an ABAP Trace

If the processing time of a transaction or program that you have identified in the Transaction Profile of the Workload Analysis Monitor is high, an ABAP Trace may be useful to identify exactly which function modules in that program or transaction are the cause of such lengthy times and therefore tune them. An ABAP Trace is started with transaction SE30 or Tools | ABAP Workbench | Test | Runtime Analysis.

From this screen you can execute a program or transaction that you choose. Once you have typed the program or transaction name you have chosen to trace, click on Execute and you will be taken to the actual chosen program or transaction screen. The trace is activated automatically. Finish the task and go back with the green arrow until the main SE30 screen. Click on Analyze to display the file. A graphic showing where most of the time is spent, database, system or ABAP, is displayed.

To obtain a list of all the function modules, click on Hit List and sort the list by Net Time in order to identify the most expensive function modules, as shown in Figure. The top ones' execution time differs greatly from that of the rest. The coding of such function modules are possible candidates for your developers to examine and determine potential improvement.

Results of an ABAP Trace (Runtime Analysis Evaluation)

Results of an ABAP Trace (Runtime Analysis Evaluation)

An important tip for developers can be found in the main screen of the ABAP Trace. Click on Tips and Tricks. You will be taken to a tool where you can measure the runtime of different ABAP commands that return the same data, yet when you compare the runtime results, they will probably differ greatly. Documentation is also included, so developers can learn to program efficiently. In conclusion, the ABAP Trace and the SQL Trace are particularly useful tools while creating your own custom code and it is best practice for ABAP developers to be proficient at using them in order to achieve optimal performance in user exits and other in-house developed code.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status