Data Integration - ITIL Configuration Management

The fastest way to populate a CMDB is to get data that already exists in other places and reorganize it to fit the schema you’ve planned. While adopting this approach, master copy of the data needs to be stored some place. This is called “source of truth “. It means the primary place where data is created and edited.

If you are unsure of the quality of data in your sources—or even worse, sure that it is inaccurate.In Information systems design and theory, as instantiated at the Enterprise Level, Single Source Of Truth (SSOT) refers to the practice of structuring information models and associated schemata such that every data element is stored exactly once (e.g. in no more than a single row of a single table). Any possible linkages to this data element (possibly in other areas of the relational schema or even in distant federated databases) are by reference only. Thus, when any such data element is updated, this update propagates to the enterprise at large, without the possibility of a duplicate value somewhere in the distant enterprise not being updated (because there would be no duplicate values that needed updating).

Deployment of this architecture is becoming increasingly important in enterprise settings where incorrectly linked duplicate or de-normalized data elements (a direct consequence of intentional/ unintentional renormalization of any explicit data model) poses a risk for retrieval of outdated, and therefore incorrect, information. A common example would be the electronic health record,where it is imperative to accurately validate patient identity against a single referential repository,which serves as the SSOT. Duplicate representations of data within the enterprise would be implemented by the use of pointers rather than duplicate database tables, rows or cells. This ensures that data updates to elements in the authoritative location are comprehensively distributed to all federated database constituencies in the larger overall enterprise architecture.

When you decide to create a new source of truth in your CMDB, you are building it by import. Because this decision between federation and import has significant implications in how to populate the database.

Distributed Sources of Truth

The data is updated and managed by processes whose chief aim is to keep it accurate somewhere outside the central CMDB. For example, most of the organizations have a HR system or directory that is maintained electronically.

Hiring, promotions, changes and retirements are linked to this directory, and any change in the statuses is recorded. Therefore, the HR system or directory system is the source of truth, while the CMDB which holds a copy of this information is known as a federated data source. Distinction between source of truth and duplications is critical because it affects the timing and accuracy of the data you can populate. When the source of truth resides outside the CMDB, you might want to build a structure that allows instant or at least very frequent updates.

Another reason it is important to understand which data sources will need to be federated is because in most cases these will be the first sources you can populate in your database. Table summarizes the differences between federated data sources that will be directly entered in the CMDB.

TableComparing Federation to Direct Entry

Comparing Federation to Direct Entry

Building a New Source of Truth

Federation is a great approach to use when you have trusted data sources and solid processes that maintain those data sources. Of course, with both trusted sources and reliable processes, you could probably build a CMDB without much more help. For the real world, where processes are not followed consistently and data is unreliable, you are often faced with creating a new source of truth.

This method is called direct entry. Direct entry is used when you don’t need to respect the source of the data, or when that source is going to change frequently.

This would be a classic example where you would want to inventory each data center using a consistent approach to recording the data about the servers, network device, software, and other items of interest. Perhaps you would use a discovery tool for the job to make sure you capture the technical details accurately. As you can imagine, performing an inventory is expensive and time consuming, so it may not be the first part of your CMDB population effort to get funded—but in some cases it cannot be avoided due to the value of the information that can be gained. This is a case where direct entry would actually involve running some sort of loading program to take information from the discovery tool and load it into the CMDB. If your organization is not used to a disciplined approach to keeping accurate data, you’ll most likely need to introduce that kind of approach by creating a new source of truth.

Data Correlation and Reconciliation

The biggest challenge of populating data is in dealing with too much data. When this data is mixed, there are often some conflicts that need to be addressed. One of the most common data issues is data that means the same thing but is represented differently. For example, suppose you want to maintain information about the manufacturer of every piece of hardware in your database. When integrating data, you have two different sources of data about server hardware, perhaps from different discovery tools deployed in different data centers. The first source might code some set of servers with a manufacturer code of “IBM.” The second source could be different servers with a manufacturer code of “Int Bus Machines.” This is a classic example of data that is semantically the same, but syntactically different, and requires reconciliation of the data by normalizing it into a common format.

Solving this issues not very easy .First we need to understand the scope of the issue and visually inspect the significant record from each source of data. Following the previous example, someone would pull the manufacturer codes for every record from both sources into a spreadsheet, and then sort to find all the possible codes. Duplicates need to be eliminated and the list needs to be scanned visually to see the items that mean the same thing. For each such group a single syntax is adopted. The data transfer mechanism is then configured to look up the values in the table and store the new value whenever it sees the old value. The obvious downside is that this transformation must happen each time the data is transferred, which will make large transfers rather slowly and can hurt performance on the receiving CMDB.

Figure demonstrates an example of data reconciliation with transformation directly into the CMDB.

data reconciliation

Sometimes reconciled data is pushed into the CMDB with a reference table.

Another common issue in data integration projects is correlation. This occurs when trying to relate two pieces without anything common between them. For example, you might have location data from a human resource system, where the key to each record is a mostly random location code that is automatically generated by the HR system. That location data is likely to have useful information, such as street address, city, and country, for each location. On the other hand, your workstation information from an asset management system might have locations listed only as a building, floor, and room number. In your CMDB schema, you would like to put these pieces together so you can know the building, floor, and city for each workstation, but there is no common key that will let you match a location from the asset system to a location from the HR system.

The first possible solution for the issue is to find a common bit of information and make it a key. If there is nothing common as such, then finding a third data source that contain information common to the first two sources.

Finally, if no suitable correlated data source can be found, you will be forced to manually correlate the data sources. Like with data reconciliation, the best tool for this kind of manual data scrubbing is a spreadsheet.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd Protection Status

ITIL Configuration Management Topics