Wednesday, 27 January 2010

A balanced approach to scoring data quality

Traditionally, a Data Quality dashboard would attempt to measure the quality of data against a number of data quality dimensions, such as:

Completeness, Accuracy, Consistency, Uniqueness, Conformity, Timeliness

Depending on how data performed against each dimension it would be given a % score, and typically be displayed on a intuitively designed dashboard with a Red/Amber/Green status.

As a data quality practitioner I generally found the information displayed on these dashboards useful. It allowed me to target into areas where dimensions were below a desired level, and undertake further analysis into the root causes.

Problems with this approach

Looking at completeness of our datasets is great, but is it really important to us that ‘Address_Line_2’ in ‘CustomerAddresses’ is only 47% complete?

Zip code is 76% accurate but how do we measure accuracy? 90210 is an accurate zip code, but I live in London, not Beverly Hills.

It’s all about the context

In his 1890 book, Principles of Economics, Alfred Marshall commented that “In common use almost every word has many shades of meaning, and therefore needs to be interpreted by the context.“

Applying context to a data quality measurement allows us to assess the impact of data quality in association with a business purpose. This allows us to incorporate business-relevant metrics into our DQ scoring dashboard.

For Instance, Marketing require Tel_No, Address_1, Town, Country, Postcode & E-Mail to be populated. The postcode should adhere to Royal Mail standards if a UK postcode, and should be blank if Country = Ireland. The E-mail should consist of a valid network address.

If a record adheres to the above rules we can flag it as ‘GREEN’, and once the recordset has been aggregated we can measure % records that are of ‘Good Quality’ for Marketing.

But there are limitations

The above methods have consistently been implemented very well by many of the large players in the Data Quality market. However, I still don’t feel completely satisfied by what the dashboard is communicating.

Yes, technically, the data can be measured, and with the context of business rules applied we can measure whether the data is fit for the intended business purpose. But I still feel that we’re only touching the surface of measuring data quality within an organisation.

A Balanced approach

I propose that we have matured beyond this traditional method of measuring data quality. In order to fully measure the impact of Data Quality Management initiatives within an organisation we need to look above and beyond data dimensions, and start to also look towards:
  • the organisations (our customers) perception of data quality efforts
  • the understanding of data quality management within the organisation
  • the cost benefit we are delivering to the organisation.
Only when we understand all of these things can we measure the true impact of Data Quality Management.

You may, or may not have heard the term ‘Balanced Scorecard’ before. states that a Balanced Scorecard “was originated by Drs. Robert Kaplan (Harvard Business School) and David Norton as a performance measurement framework that added strategic non-financial performance measures to traditional financial metrics to give managers and executives a more 'balanced' view of organizational performance.”

An adapted balanced scorecard approach, such as the diagram below, would suit the needs of achieving a balanced view of scoring Data Quality.

The four sections that make up the scorecard are equally important to us in allowing the performance of Data Quality Management to be measured.

Over the coming weeks I want to delve deeper into this subject, taking each section of the scorecard and discussing key metrics that we can assess, and how we can then slot all the sections together to build a balanced scorecard within your organisation.


Charles Blyth said...


Great post. I think the ‘Balanced Scorecard' approach is an excellent idea and ties in very well with true enterprise performance management and continuous improvement. Looking forward to the rest of the series.


Dylan Jones from Data Quality said...

Excellent Phil, looking forward to where you take this, got some ideas myself so will come back regularly.

Also, Arkady Maydanchik has posted some useful advice on DQ Pro last year, might give another perspective:

Satesh said...

Good one Phil, the article neatly describes the evolution of DQ measurement parameters, from Good to Better to Best where

Good - data profiling exercise in ur article 'Address_Line_2’

Better - Associating KPI with DQ

Best - Balance Score card approach to DQ

I have also been thinking on similar lines (for an article) but my thought was limited to the 'better' part of DQ i.e. associating KPI with DQ

But your balanced score card approach towards DQ is certainly something i am looking forward to hear more on

Great Going!!!


Phil Wright said...

Charles / Dylan / Satesh - thanks for the comments. I'll have the next article up on in a day or so, would love to hear your thoughts as well.

Dylan, I'll be sure to read Arkady's scorecard advice from your link - I read his book a couple of years ago, so I'm sure I'll enjoy.

Garnie Bolling said...

Phil, great start, looking forward in reading the rest of your posts about Customer, Financial, Process and Dimensions... thanks for sharing your insights...

Phil Wright said...

Thanks Garnie, I've actually just posted the 2nd part this afternoon.

Post a Comment