Garbage In, Garbage Out: Getting Good Data Out of Your BI Systems

Updated: September 03, 2010

What does this study recommend? The short answer is, you should start thinking of your organization as being in the business of gathering data, turning it into information, and using that information as effectively as possible. In other words, you think of your organization as trying to get as much high-quality, potentially useful information to your BI solution as possible, and then analyzing that information, followed by using that analysis to make decisions as rapidly as possible. Then you try to develop a set of metrics that will tell you how well you are doing, and what are the weak points in the process.

This set of metrics measure what I call data usefulness. I define data usefulness as the ability to deliver all needed accurate, consistent, and appropriate data to the right user in a timely fashion. My survey in the study convinces me that there are significant and growing problems at every point in the process of converting data into useful information. Table 1 shows my take on the typical steps in the data-delivery process, the metrics by which the effectiveness of each step should be judged, and the problems that many are seeing today at each step. The key take-away point is that fixes to one or two steps will not in the long run fix the overall data-usefulness problem. Rather, organizations of all sizes need to take a comprehensive, long-term approach to ensuring data usefulness.

The Data Delivery Cycle

Step

Metric

Example

Problem

Data entry

Accuracy

Percent of data items with errors

Majority of businesses report more than 15% of items with errors

Data consolidation

Consistency

Number of data items with multiple records and no master record

Majority of businesses report more than half their data inconsistent

Data aggregation

Scope

Percent of data sources on which a cross-data-source query can be performed

Majority of businesses report they can't do cross-database query on more than 2/3 of company data

Information targeting

Fit

Percent of time data delivered that is not appropriate to end user

Majority of businesses report more than 60% of the time, data delivered to executives inappropriate

Information delivery

Timeliness

Time taken to deliver (entry to arrival on screen) to average user

Majority of businesses report a week or more average time to deliver

Information analysis

Analyzability

Percent of time user can't immediately do online analysis of data received

Majority of businesses report can't do immediate online analysis more than ½ the time

Process adjustment

Agility

Percent of new outside data sources not available within 1/2 year

Majority of businesses report more than ¾ of relevant new Web information not made available inside the company within ½ year

The Data Delivery Cycle also shows that, by users' own estimation, more than 2/3 of the data that flows into the organization is not used effectively. In fact, if you include the inability to flow new sources of data into BI, more than ¾ of the useful data out there never gets used right.