Managing the environment - IBM Cognos

The IBM Cognos BI release contains a variety of features around the area of system management. One of the primary features is the metrics that are a part of the system task that pertain to the various components that make up the IBM Cognos BI environment:

  • Services
  • Dispatchers
  • Servers
  • Server groups

These metrics provide administrators with better insight into the status and overall health of the various components that make up the business analytics solution. The system metrics are broken into three dimensions:

  • Individual metric
  • A metric group
  • The service to which these pertain

Individual metrics

Key individual metrics that are monitored as part of a solid system management methodology are:

  • AverageTimeInQueue

    The average time in queue is calculated based on the total amount of time that all requests have spent in the queue divided by the total number of requests that have been in the queue. This averaged value maps to the latency metric in the administration console.

  • FailedRequestPercent and SuccessfulRequestPercent

    This metric displays the percentage of failed/successful requests that have occurred. The calculation is simple: (Number of failed or successful requests / number of received requests) * 100

  • These percentage metrics are excellent metrics to monitor when resetting the metrics is either not possible or is not going to be done. The reason for this is that tolerance thresholds built on the percentage metrics are always relevant regardless of when the last reset occurred. With count metrics for failed and successful requests (Number Of Failed Requests and Number Of Successful Requests), thresholds are eventually reached after a period of time, and never return to a green status.

    For example, if the threshold score is set to turn red after 50 failed requests, after the threshold is exceeded (one day, one week, one month, and so on), the threshold score is always red until the service is restarted or the metrics are reset. The percentage metrics change over time.

    Using the previous example, if the failed requests hit 50 after the first 50 requests, the value is 100% and more than likely results in a threshold score of red. From that point forward, if every request is successful, the metric value decreases, thus moving the red score to yellow and then eventually green. Due to this, if only the percentage metrics are monitored through thresholds, no resetting of metrics ever needs to occur.

  • MillisecondsPerSuccessfulRequest

    This is the average amount of time spent processing a successful request.

  • NumberOfFailedRequests

    Number of failed requests, not to be confused with the failed request percentage metric, is a cumulative count of the number of failed requests that have occurred since the last reset.

  • NumberOfProcessedRequests

    This specifies the amount of received requests that have been processed by the dispatcher.

  • NumberOfRequests

    This specifies the amount of requests that have passed through the queue since the last time that the metrics were reset. This value maps to the numberof queue requests in the administration console.

  • NumberOfSessions

    This indicates the amount of user sessions that are currently active in the environment.

  • NumberOfSessionsHighWaterMark

    This indicates the maximum amount of user sessions that were active in the environment at one time.

  • NumberOfSuccessfulRequests

    Number of successful requests, not to be confused with the successful request percentage metric, is a cumulative count of the number of successful requests that have occurred since the last reset.

  • QueueLengthHighWaterMark

    The value for this metric is an indication of what the highest amount of requests in the queue has been since the metric was last reset.

  • ResponseTimeHighWaterMark

    The value for this metric shows the longest period of time spent processing a request, either successful or failed.

  • ServiceTimeAllRequests

    The service time metrics show the amount of time that was spent processing the requests. This particular metric value is the total amount of processing time that was used for all requests, including both failed and successful.

  • ServiceTimeFailedRequests

    This metric value is the total amount of processing time that was used for all failed requests.

  • SuccessfulRequestsPerMinute

    The definition of this metric slightly differs from the traditional definition, or perception, of successful requests per minute. This value does not indicate an on going average from
    minute to minute, but rather it is an indication of how many requests have been processed during the amount of time that the system has spent processing them. The formula is:

    For example, 10 requests are executed successfully and the server has spent 30 seconds executing the requests. When looking at the metric after a minute, the traditional definition would indicate the average is 10 requests per minute. After the second minute, the value would be 5, and so on. The actual use of this metric in IBM Cognos BI would be 20 after one minute and would still be 20 after 2 minutes.

    This algorithm shows that what the average successful requests is based on is the amount of processing time that it took to execute them and not the actual time. This is done to provide a real value that is not impacted by periods of inactivity. This metric is a great way to track server throughput.

  • TimeInQueue

    This cumulative metric shows the total amount of time that has been spent by all objects in the queue. For example, if 30 requests have been in the queue at some point, each with a queue time of 1.5 seconds, the value for the metric is 45, as the total time spent (30 * 1.5) is 45 seconds.

  • TimeInQueueHighWaterMark

    This displays the longest amount of time that one object has spent in the queue.

Metric groups

The individual metrics are divided into three main metric groups:

  • Request

    These metrics pertain to the specific requests that are handled by each component in the environment. Notable metrics that are included in this group are:

  • Amount of processed requests
  • The percentages of successful versus failed requests
  • The amount of processing time for these requests
  • Queue

    These metrics provide insight into the amount of requests that are not handled immediately and therefore are placed into a queue to be processed when the resources become available. Several metrics in this group are the amount of requests that have been in the queue, the length of the queue, and how much time requests have spent in the queue.

  • Process

    These metrics display information regarding the amount of processes required by the product to function. Metrics such as the number of current= processes and the maximum number of processes that were spawned are available. There are a few metrics located outside of the three main metric groups (JVM uptime and heap size information, for example), but the majority of the individual metrics are a part of the three metric groups.

Services

The final dimension to the system metrics is how the individual metrics and metric groupings relate to the service to which they are associated. Understanding what actions are performed by each of the services provides greater insight into the values that are being reported.

The system task displays metrics at all of the levels of the topology. The metrics are collected at the service level, which is the lowest level in the topology. From the service level, metrics are then consolidated through the rest of the topology: services to dispatcher, dispatchers to the server, and then the servers to the system level. Therefore, requests to the individual services affect the metric values at the higher server and system levels.

This is an important fact when working with environments that are made up of multiple servers or dispatchers. When viewing the metrics scorecard that is available as part of the system task in the administration console, the green, yellow, and red traffic light indicators reflect the most severe indication value in the hierarchy. That is, the poorest indication rises to the top. This was done to provide administrators with the ability to visually see, at a glance, whether any key metrics are not performing as well as expected without having to drill down to lower levels of the hierarchy.

The metric values are updated in real time and reside in an MBean within the dispatcher Java process. Because the metrics are live, they are dynamic and are only reset when the IBM Cognos BI service or process is restarted or when an explicit request, either manual or programmatic, is executed.

As the metrics are dynamic and reside as part of the dispatcher, they are volatile and thus are reset every time that the service or server is restarted. In certain situations it might be desired to have the metrics reset without restarting the entire application. This is possible by using the Reset button beside the metric grouping name in the Metrics dialog box. When the Reset is clicked, all of the metrics that belong to the service in context are reset. If a higher level is in context, such as a server or the system, all of the metrics that pertain to that object are also reset.

Reset the request metrics as they pertain to the Content Manager service
Reset the request metrics as they pertain to the Content Manager service

An important note regarding the metrics, and the administration console in general, is that there is no auto-refresh feature. It was a design decision, based on the general feedback from administrators, that this limited the ability to thoroughly analyze a series of metrics if the values changed on a regular basis.

That said, there are a few manual refresh options available within the administration console:

  • Fragment refresh
  • This is the button located in the upper-right corner of the fragment that refreshes the values of the contents in the frame. For example, in the system task, refreshing the metrics frame updates the metric values, but does not change the contextual object or refresh any of the values in any of the other windows.

    Refresh button will retrieve the current values for the metrics
    Refresh button will retrieve the current values for the metrics

  • Page refresh
    This button is available on the main IBM Cognos BI toolbar located at the top of the browser page. When clicked, the entire page being viewed is refreshed. For example, clicking this button when viewing the system task refreshes the scorecard, metrics, and settings frames without losing context.
  • Button to refresh the entire page
    Button to refresh the entire page

  • Browser refresh
    Using the browser refresh button to update the administration console causes all pages to be refreshed. This action causes all context to be lost and, after the refresh has occurred, the default view allowed by your administrative capabilities displays. To provide a reference as to the timeliness of the information being viewed, there is a summary bar at the bottom of each frame that displays the last time that a refresh occurred.
  • Bottom of frame indicates when the values were generated and displayed

    Bottom of frame indicates when the values were generated and displayed

Metric tolerance thresholds

Whereas the ability to have real-time metrics displayed in the administration console provides valuable information when monitoring the environment, the value all but disappears when not actively in the console watching the metrics. With this in mind, it is possible to manually set a tolerance threshold on the individual metrics, which can provide the basis for automated alerts through IBM Cognos Event Studio, or by creating self-service personal alerts in IBM Cognos Viewer.

These thresholds allow administrators to set ranges that provide them with a quick overall view into the system health. The current metric values are displayed in the system task as green, yellow, and red traffic light indicators, based on the range that the value resides. When a series of thresholds is assigned to key indicators that pertain to the specific environment, an overall scorecard is possible.

System Scorecard showing all dispatchers
System Scorecard showing all dispatchers

A quick glance at the scorecard indicates that there is a dispatcher that has a yellow indicator, which means that certain underlying service metrics have values that are nearing the acceptable norm and might warrant further investigation.

Creating a threshold

Before an overall scorecard can be established, tolerance thresholds must first be defined on the key metrics:

  1. Launch the IBM Cognos administration console.
  2. Select the System task on the Status tab.
  3. In the scorecard frame, drill down on an available server.
  4. Drill down again on an available dispatcher to reveal the underlying services.
  5. Click BatchReportService.

The BatchReportService object changes the focus of the metrics frame on the upper-right side so that all of the relevant metrics for the batch report service are displayed .

Request metrics for the batch report service

The metrics batchreportservice frame consists of two metric groupings:

  • Process, which provides metrics about the amount of configured and running batch report processes
  • Request, which displays metrics about the amount and duration of requests handled by the batch report service

Take the following steps:

  1. Expand the Request metric grouping by clicking the plus sign (+).
  2. Locate the percentage of failed requests metric and click the pencil icon.
  3. The “Set thresholds for metric - Request - Percentage of failed requests” dialog box opens. This dialog box is divided into two sections. The first section is the Performance pattern, which specifies whether high, middle, or low values are good (a green traffic light indicator). The second section is where the actual threshold values or ranges that will drive the type of indicator are defined.
  4. Because failed requests are undesirable, select Low values are good.
  5. Enter a value in both of the boxes to define the range.it shows that the indicator light is green until the value for the particular metric hits 3%, then it turns yellow. If the metric value continues to increase and a value of 5% is obtained, the indicator light changes to a red.
  6. Defining a metric threshold
    Defining a metric threshold

    Light indicator: The yellow indicator light value is 3.0%, which means that at 3% the indicator changes from green to yellow. Clicking the down arrow beside 3.0% moves the box down to the green threshold, which results in 3.0% remaining green, but anything higher changing the status to yellow.

  7. Click OK to save the threshold.

Returning to the metrics frame, the status indicator for the newly created metric threshold is displayed .

Metrics frame with newly created metric performance pattern
Metrics frame with newly created metric performance pattern

After you define metric thresholds, if auditing is enabled for the IBM Cognos BI version 10.1 environment, any threshold exceptions (changes to the indicator light color) are written to the COGIPF_THRESHOLD_VIOLATIONS table. This table allows administrators to proactively create IBM Cognos Event Studio agents that monitor the audit table for exceptions and that notify the required administrators. The audit database entries also provide the information that is required to report on the volume and severity of the exceptions over time. This reporting is key in indicating peak periods of usage and keeping track of unexpected changes to usage patterns.

Programatically setting thresholds

One of the most common questions that is asked is why there are no default thresholds? The is because there are many factors that influence metric values, such as size of the user base, number of concurrent users, volume of reports being executed, server hardware, available memory, and so on, it is impossible to provide defaults that would have any relevance in all environments.

Not only could setting metric thresholds be a timely exercise based on the volume of metrics available, but what values should be used that make sense for the environment in question? Are the values being used based on the current settings? If so, are these values indicative of a typical day? What about during peak periods? Won’t all of the metric thresholds turn red during peak period usage?

There is help for this. Using the metric export capability, it is possible to gather metric exports for a period of time, typically spanning a high reporting period when available, and then using the metric exports, programatically through the SDK, set the threshold ranges based on the real-world usage statistics for a given environment.

Reacting to bottlenecks due to unexpected events

System administrators are occasionally faced with unexpected environmental events or unforeseen changes to usage patterns. Reacting to these occurrences is critical to maintaining service level agreements and ensuring that the data in reports is accurate and not out of date.

Using a Great Outdoors company scenario, Sam Carter, the IBM Cognos Administrator, receives a message indicating that there will be a 60-minute outage of the reporting database from 1 p.m. to 2 p.m. due to scheduled routine maintenance. To ensure that any reports scheduled during this time do not fail, Sam must react to the situation by taking the following steps:

  1. Launch the IBM Cognos administration console.
  2. Select the Upcoming Activities task on the status task.
  3. Select the 1 p.m. to 2 p.m. time slot on the chart by clicking the bar above 13 along the x-axis, which changes the list display at the bottom of the page.
  4. Select all of the entries by clicking the check box in the upper-left corner or the list, or if only certain entries will be impacted by the database outage, manually select the objects or use the Advanced options on the left to filter the results to display only the impacted objects and select all.
  5. After you select the objects, use the Actions drop-down menu beside one of the selected items to choose the Suspend option or use the Suspend button from the toolbar menu.
  6. In the Suspend Activities dialog box, select Until, and change the calendar control to reflect a period when the reporting database is back online. For example, the reports can be shifted to 3 p.m. to compensate for any additional minor delays.
  7. Click OK.
  8. The Upcoming Activities chart updates to reflect the changes.

The scenario describes reacting to a planned database outage, but the ability to suspend reports for a predefined period of time can also be used to help spread an unexpected scheduling load across other less active periods. The ability to allocate scheduled objects throughout the day can help ensure system throughput, so it is a good practice for IBM Cognos administrators to proactively examine the upcoming load for the day and make changes were necessary.

Building on the previous scenario, Sam Carter receives word that there are complications with the database maintenance and that the outage will last longer than the previously anticipated 60 minutes. Unfortunately, there is no new estimate as to when the database will be back online and available for reporting. Sam must now react differently to this unforeseen hurdle because there is no estimated time for the resolution. He must take the following steps:

  1. Launch the IBM Cognos administration console.
  2. Select the Upcoming Activities task on the status task.
  3. Click the bar above 15 along the x-axis to change the filtered list to the objects scheduled to run from 3 p.m. to 4 p.m.
  4. Select the objects that had been previously rescheduled.
  5. Using the Actions drop-down menu beside one of the selected items, or using the toolbar menu, Sam chooses to Suspend the selected items.
  6. In the Suspend Activities dialog box, select Indefinitely. Click OK.

Contrary to when the schedules were postponed and a new time was defined in the first scenario, the rescheduled items do not appear in any specific time slot in the chart. Because they were suspended indefinitely, there is no defined time for their execution, so there is no way to represent them on the chart. To identify that there are items that are suspended indefinitely, there is an entry in the chart legend called Suspended that indicates the number of suspended objects.

Upcoming activities chart with legend
Upcoming activities chart with legend

System trending

Through the combination of viewing the metrics available in the aDministration Console and proactively monitoring metric thresholds using IBM Cognos Event Studio alerts, it is possible to quickly respond to unexpected changes in usage patterns.

But how is the system doing today in comparison to last week or last month? Is the system handling more or fewer requests than last month? Are the response times and queue lengths increasing due to higher system usage? Because the metrics in the console are essentially a current snapshot of the environment, it becomes almost impossible to answer these questions without a mechanism to record the metric values.

There are a couple of ways to accomplish the recording of the metrics. The first mechanism uses product functionality to export the metrics to a text file and then using an extract, transform, and load (ETL) tool to load them into a reporting database. After they are loaded into the database, reports can be created so that analysis of key metrics can be tracked. This is a continual process of exporting and then loading the metrics so that the reports are always current.

The other mechanism is to use a tool that is both capable of connecting to the Java environment directly and writing the values to the reporting database. One such tool is IBM Tivoli Directory Integrator, which connects to the IBM Cognos VM, reads the metrics from a customizable list of services, and then writes out the values to a relational database. This process is automated and can be scheduled so that there is no manual intervention required, and the metric values can be written to the database at almost any interval.

Consuming system metrics from external tools

This section discusses Java Management Extensions and JConsole.

Java Management Extensions

Java Management Extensions (JMX) is a Java technology that supplies tools for managing and monitoring applications and service-orientated networks. These resources are represented by objects called MBeans. MBeans, which stands for Managed Bean, represent a resource running in the Java Virtual Machine (JVM).

How this translates to the IBM Cognos topology is that the dispatcher component stores the raw metrics in an MBean within the JVM running the IBM Cognos BI application. Besides the administration console, the metrics are accessible externally by using the industry-standard JMX. Thus, for tools such as IBM Tivoli Monitoring to connect to the IBM Cognos metrics, a JMX agent must be created to interface with the IBM Cognos MBean.

Connecting to metrics using JConsole

This section describes the steps required to view the metrics externally using the JConsole application, which is available as part of the Java V1.5 JDK package. Before the metrics can be exposed to external sources, the Java MBean must first be made available for external access. To accomplish this, a parameter must be added to one of files within the application server.

For a default Tomcat installation, complete the following steps:

  1. Navigate to the <install_dir>webappsp2pdWEB-INF directory and open the p2pd_deploy_defaults.properties file in a text editor. Uncomment the existing rmiregistryport line by removing the # symbol(and modify the port number if required) .
  2. Enabling the rmi registry port for JMX support
    Enabling the rmi registry port for JMX support

  3. Save the file.

It is important to note that there is no security associated with the JMX implementation, so after the entry has been added to the file, anybody can connect to the MBean if the proper connection string is known. That is, product access to the metrics can be locked down through the security policies in IBM Cognos Connection and the administration console, but these policies do not apply when connecting externally.

To enforce user name and password external access:

  1. Open IBM Cognos Configuration.
  2. In the explorer frame, select Environment.
  3. Locate the External JMX Port property in the Environment - Group Properties dialog box and type 9999 (to match the port number used for the rmiregistryport entry from a previous step).
  4. Click the value field of the External JMX credential property, and then click the pencil icon..
  5. Port number note: The number specified in the added string pertains to a port number. Ensure that the port number specified is available for use and is not being occupied by another application

  6. On the “Value - External JMX credential” dialog box, specify the user ID and password that will be used to secure the IBM Cognos MBeans.
  7. Securing the IBM Cognos JMX interface
    Securing the IBM Cognos JMX interface

  8. Click OK.
  9. Save the new configuration parameters.
    Because this is a setting that is read when the application is started, a restart is required if the IBM Cognos BI application is already running. After you start or restart the application:
  10. Locate the jconsole.exe executable in the bin directory of the Java JDK and launch it.
  11. JMX implementation note:The JMX implementation does not allow for spaces in the install path when using a JRE other than the default IBM JRE provided with the IBM Cognos installation. If there are spaces in the installation path and the non-default IBM JRE is being used, JConsole must be executed using the following command line:

    Jconsole -J-Djava.rmi.server.useCodebaseOnly=true
  12. When presented with the JConsole: Connect to Agent dialog box, switch to the Advanced tab. Connect to the following JMX URL:
  13. Connecting to IBM Cognos using JMX
    Connecting to IBM Cognos using JMX

Connection note: The machine_name entry must be the server name and cannot be localhost.

  1. Supply the proper credentials if the External JMX credential value was supplied in IBM Cognos Configuration.
  2. Click Connect to connect to the system metrics using JMX.
  3. Ensure that the MBeans tab is selected (if not selected by default).
  4. Expand the com.cognos section of the tree. This is the location of all of the metrics that reside in the administration console.
  5. View the metrics for a service, for example, report service, by expanding the reportService entry to expose the objects beneath.
  6. If more than one dispatcher is present in the environment, select one of them and expand the dispatcher name entry in quotation marks.
  7. Click the Metrics option beneath the dispatcher, which displays all of the metric names and values in the right frame.

To compare the metrics displayed in the IBM Cognos administration console and JConsole:

  1. Open a web browser session, launch IBM Cognos BI, and navigate to the System task within the administration console.
  2. Drill into the same dispatcher as used in the JConsole interface.
  3. Keep drilling down until the ReportService entry is located.
  4. Click ReportService to filter the metrics in the upper-right frame. Providing that no additional report service requests were made, the values displayed in the Metrics - ReportService frame are identical to the values displayed in JConsole, although there might be slight formatting differences figures.
  5. Report service metrics in IBM Cognos administration console

    Report service metrics in IBM Cognos administration console

    Identical report service metrics in JConsole

    Identical report service metrics in JConsole


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

IBM Cognos Topics