how to go about measuring accuracy - ITIL Configuration Management

Now that you’re armed with a well-considered way to define accuracy, you can move on to measuring the accuracy of the CMDB. Defining accuracy was about policy and documentation, whereas measurement is about practical action. This section describes in detail how to go about measuring accuracy in your environment.

Counting Numbers of Errors

The first part of measuring the accuracy of your CMDB is to count the number of errors. You know what an error is, but how do you find errors? The most reliable way is by comparing the data in the CMDB with other data to see whether there are differences. You can think of these as data audits or spot checks.

The sample size you use to conduct these spot checks will shrink as you find higher levels of accuracy and grow if you start to see less accuracy. A good rule of thumb is to spot-check 5 percent of your database. If you are finding very few errors with this sample, you should assume your overall CMDB accuracy is good and reduce the sample size to 2 percent or so. On the other hand, if your original sample of 5 percent shows numerous errors, grow the sample size to 10 percent for the next spot check and continue to grow until you’ve identified the major sources of errors.

There are several good sources to use for comparison. If you originally got the data from a discovery tool, you can compare the data in the CMDB to the same data in the discovery tool. If you are reconciling discovery data with your production CMDB daily, you should find very few errors.

Another possible source is to do a physical comparison back to the actual item being compared. For software, this could be as easy as selecting “Help” and “About” to determine the version of the software and comparing that to the version listed in the CMDB. For hardware, this might mean going to where the hardware is located and finding characteristics such as the serial number and model number printed on the case. When doing this kind of comparison, it is clearly impossible to verify every single CI and relationship monthly. You will need to select a representative sample and extrapolate to determine the number of errors in the whole database. You’ll want to choose a large enough sample so that a single error doesn’t skew your results too far. If you compare only ten items and find that one is incorrect, you have a 10 percent error rate. Adding another 90 items to the comparison without finding another error brings your error rate down to only 1 percent.

The larger the sample size, the longer it will take to do the comparison and investigate the differences. A larger sample will get you closer to the true accuracy of your database, however. Finding the proper balance of sample size and effort is a matter of maturity and may take some time. In addition to finding errors by actively looking for them, you will be able to discover some errors by using the configuration data in your operational processes. Each time someone raises a request for a change, they should be looking at configuration data to assess the impact of the proposed change. Many times this person raising the request for change (RFC) will be very knowledgeable about the environment and will be able to spot some errors just by looking at the data. You need to train your staff to avoid the tendency to simply get frustrated in silence at this phenomenon. Instead, people finding errors while looking at the data should be rewarded and encouraged to report the error. It wouldn’t be out of the question to even put a “bounty” of some kind on each error, rewarding people for paying attention to details and improving the accuracy of the data.

The incident management process often results in someone digging deeply into a particular part of the environment. These people should be encouraged to take a few extra seconds to compare what they find with the CMDB. If they will do this comparison before resetting a router or rebooting a server, you will get many of the benefits of a physical inventory without having to incur the cost. By helping improve the accuracy of the CMDB in this way, you also improve the IT staff’s confidence IT in the data. The more people who feel responsible for accuracy, the higher your accuracy is likely to be. Figure summarizes these three ways to detect CMDB errors.

three ways to detect CMDB errors.

There are three ways to actively find errors in the CMDB.

Your operational processes are also the most likely place to find undiscovered CIs and stale relationship kinds of errors. As people become more experienced in using configuration data, they will begin to sense when pieces of the environment are not described completely. You’ll start to hear people say things like “I know there should be a router here, but it’s not described in the CMDB.” At this point, you know that the organization has really embraced configuration management and is maturing in their understanding of the interrelationships of all the processes. Spot checks and operational processes should work together to give you the best possible coverage of your entire database. You should specifically select your physical audit data to include things that haven’t been involved recently in one of the operational processes. Likewise, if you have selected something to participate in a spot check and realize that a technician had just checked that CI recently, you can remove it from the list. Seeing both sides as part of the same effort helps to reduce your costs and increase coverage.

The important part of finding errors is having a good means of logging them. Some organizations like to create a special type of incident to track errors they find with configuration data. This allows you to use the incident tracking system you probably already have, but it requires that you work out reports and dashboards to not include these tickets that really don’t represent degradation of service to any of your IT consumers. Another possibility, if your tool is flexible enough, is to create a table in the CMDB to track the errors. This has the benefit of easily relating the errors to the CI or relationship involved, but requires that you create input screens to enter the data. Of course, if neither the incident system nor the CMDB is a suitable place for you to store error information, you can always use a spreadsheet to do your tracking. The tool you choose is not nearly as important as the information you track.

For every error, you should capture the person who found it, the way it was found, the CI or relationship identifier where the error is found, and the date and time the error was uncovered. This information will help you determine the source of the error and find ways to prevent it from happening in the future.

Investigate and Sort Errors

After you have found and logged all errors, the next step is to investigate them to determine whether they truly are errors. This is where you take into consideration the timing situations described earlier. Each of your various operational processes is likely to have a different cycle, so you can’t simply check whether the error occurred some number of days after the last incident or change. Instead you need to consider each discovered error, understand when the data should have been changed by your standard processes, and then eliminate each one that has a valid explanation for the data inconsistency.

Investigating errors is a time-consuming process. You should actively look for opportunities to automate it as much as possible. For example, if you can create a clever query that crosschecks data between the CMDB, the incident ticketing system, and the change management tool, you could produce a report that shows the most recent change and incident for each CI that has an error. Such a report can help to eliminate many timing errors very quickly.

As you consider the policy for defining accuracy, you’re likely to see other opportunities to build reports or queries that can sort the false errors from the real ones more quickly. It will probably not be possible to completely automate the error investigation, but you should get as close as you can. When you have investigated all errors, you should indicate in your tracking system which ones remain as errors and which ones can be explained. Usually this can be accomplished with a simple status field for each error.

Reporting Overall CMDB Accuracy

Most senior IT managers want to think of CMDB accuracy in terms of a single percentage. In my experience, people want to talk about the CMDB as being somewhere between 90 and 100 percent accurate. That’s a great measurement, but getting there from a simple list of discovered errors is not straightforward. The formula for percentage accuracy is going to be the number of errors discovered divided by the number of opportunities for error. It’s that second number that takes some consideration. Let’s think about how to define the denominator of the formula shown in Figure .

denominator of the formula

There is a simple formula describing the accuracy of the CMDB.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status

ITIL Configuration Management Topics