Thursday, 25 February 2010

The First Step on your Data Quality Roadmap

You may be about to embark on an exciting Data Quality initiative, full of enthusiasm and armed with the belief that you can change organisational culture and save the business from poor data quality.

In order to effectively roll out policies and procedures you may be brainstorming, and then formalising, a Data Quality Roadmap.

What is a Data Quality Roadmap?

It's a strategic enabler.

It allows you to combine your short-term and long-term goals.

It's a framework for progress.

It helps you to co-ordinate people, process and technology, and enables you to communicate where you are, and where you're heading, in a digestible and measurable way.

Sounds great, where should I start?

As George Santayana once wrote:

"Those who are unaware of history are destined to repeat it"

With this quote in mind, I would propose that the first step on any Data Quality Roadmap would be to understand what DQ improvement/management initiatives have previously been undertaken within your organisation.
  • Who was responsible for previous initiatives?
  • What processes & procedures have previously been implemented?
  • Where did they succeed or fail?
  • When did previous initiatives take place?
  • Why did they succeed or fail?
  • How were they received by the business?
Learn lessons from what has happened before, and use this historical analysis as a basis to implement strategic changes within how Data Quality is tackled in the future.

How will this exercise aid the future?

On your roadmap there may be an item such as "work with the business to create a common dictionary". Such an item may cause someone in the business to state: "We tried this before, and it didn't work".

Using the information gathered from our historical analysis of previous DQ initiatives, we can attempt to get to the root cause of why it didn't work. We can work with the business, gather their opinions, and move forward to creating a solution that does work. A solution which increases business confidence and allows us to achieve our strategic goals.

Thursday, 18 February 2010

Can motivations impact the state of data quality?

I recently finished reading a fascinating book about motivation, called Drive, written by Daniel H. Pink. The book guides us through traditional motivators that drive performance and goes on to discuss a paradigm shift occurring within factors that motivate us in life, and in the workplace.

The book got me thinking about data quality professionals and the things that motivate us within our work.

Lets look at an example

Within our organisation we have a team of enthusiastic, passionate data quality professionals.

Lets look at possible motivations within their roles:

Intrinsic Motivators
  • They are passionate about Data Quality
  • They want to rid the organisation of poor quality data
  • They strive to be the data experts within their organisation
Extrinsic Motivators
  • They need to hit objectives set by managers (linked to annual bonus payment)
  • They need to ensure the organisation adheres to regulatory compliance
  • They have a pressure to deliver ROI to the organisation
Based on the motivators above, lets look at two differing scenarios:

Scenario #1

The DQ team is given autonomous reign within the organisation. They listen to the business, focus on their key issues, solve a number of data quality issues and become trusted advisors. The Intrinsic motivators of the team members are completely satisfied, and the team is heralded as a success by the business community, therefore demonstrating ROI.

Scenario #2

The DQ team, who report directly to the CFO, are engaged in a CFO sponsored program to become financial data experts and to ensure that the organisation achieve regulatory compliance. This long-term program was initiated due to a number of data quality issues that risked compliance failure. The DQ team have been supplied with a checklist that, once adhered to, will ensure that the organisation is fully-compliant and the team will be deemed a success.

So what about the motivations?

In both scenarios the team are satisfying both intrinsic and extrinsic motivators. Scenario #1 saw the DQ team particularly satisfy their intrinsic motivators, where as in scenario #2, their extrinsic motivators were challenged to ensure that compliance was met.

How do you feel motivating factors impact upon the performance of DQ activity within an organisation?

DQ as a checklist

Within scenario #2 it was noted that the DQ team work to ensure that a checklist is satisfied, which will in turn ensure that the organisation achieves compliance.

By placing emphasis on the pressures of successfully completing a checklist, do we risk a negative impact on the performance of DQ efforts within an organisation?

Yes, completion of the checklist in the example above would ensure regulatory compliance, but will it come at the cost of poor DQ, and the potential of cutting corners within areas that do not have to adhere to compliance? Let me know your thoughts.

Sunday, 14 February 2010

A balanced approach to scoring data quality: Part 6

I wanted to spend a little bit of time concluding this series by discussing how we could visualise the example metrics that were discussed in the past few blog posts.


Wikipedia defines a Dashboard as "an executive information system user interface that (similar to an automobile's dashboard) is designed to be easy to read". This is exactly what we want to achieve when creating a dashboard to present our DQ metrics.

The below diagram shows an example layout for the dashboard:

The 'summary, actions & contact information' section is important, and one which we haven't previously discussed. This section should consist of commentary to allow for further context to be applied to what is contained within the four metric sections of the scorecard. A summary of what has happened in the past month/quarter (since the last scorecard publication), alongside a summary of DQ Management actions to be undertaken in the coming weeks/months. Contact information should always be included to aid ease of further assistance or questions.

Remember the Metrics?

Within the previous posts in the series we looked at a number of example metrics which could be reported upon a DQ scorecard.

Now lets look at an example of how we could bring these metrics together onto our dashboard.

We can even drill down into aspects of the scorecard. For example, each metric within the 'Customer' Section could have an option for the viewer of the dashboard to drill down to gain further insight into that particular metric. The same functionality could exist in the 'Data Dimensions' section. At a high level, we can show a RAG status for each individual business department, or process, and from there we could allow the viewer of the dashboard to drill down in order to ascertain how that RAG status was derived, and which dimensions require particular improvement, or monitoring.

In Conclusion

The balanced approach to scoring data quality that has been discussed within this series can be used as both a vehicle to promote continuous improvement and as an effective performance management tool. I'd be keen to hear your stories of implementing DQ scorecards - successes, failures, lessons learned - so please leave a comment within the series, or contact me directly.

Related Posts

Part 1: Introduction

Part 2: Customer

Part 3: Internal Processes

Part 4: Financial

Part 5: Data Dimensions

Tuesday, 9 February 2010

A balanced approach to scoring data quality: Part 5

We've previously introduced the scorecard and discussed the 'customer', 'internal processes' and 'financial' perspectives. Now we're going to take a look at the final section of the scorecard, which deals with 'data dimensions'.

The sole reason I held this section back - preferring first to focus on other sections of the scorecard - is that this is the section you will be most familiar with. Profiling exercises have been the de facto standard for measuring data quality for many years. There are a number of tools on the market that will profile data in great detail. If you don't have the budget to buy a tool you could even build your own - Dylan Jones and Rich Murnane give us some great examples - which will allow you to profile your data against a multitude of dimensions of data quality.

What are dimensions of data quality?

When we talk about dimensions of data quality we are talking about aspects of quality such as accuracy, consistency, duplication, referential integrity and confirmity. Some basic practical examples could include:
  • Table 'SALES' should contain only 1 record per 'TRANSACTION_ID'.
  • Field 'AMOUNT' in 'SALES' table should always contain a numeric value
  • Field 'CUSTOMER_NAME' should never be NULL
  • Table 'CUSTOMER' should contain a corresponding record in our 'CUSTOMER_ADDRESSES' table
  • Field 'Post Code' in 'CUSTOMER_ADDRESSES' table adheres to Royal Mail PAF Standards

By profiling data against our data quality dimensions we can ascertain the number of records that are deemed OK, against those that are deemed to contain potential issues to review. However, in order to maximise the benefit of a data profiling exercise we should look towards answering the following question:

How well are our data quality dimensions performing when aligned to business rules and expectations?

Why Align to business rules and expectations?

Primarily because it allows us to add context to data. Context is important due to the fact that it will determine how data is translated by the business, and ultimately, how the data is used. What is deemed to be 'Good Quality Data' may differ between two business users, or two different business departments.

This is because each user/department may use data in a different way, and therefore have different requirements of what is essential to ensure that data is fit for the purpose it is intended.

How to ascertain?

In order to understand rules and expectations that should be placed upon data to ensure that it is of good, and fit, quality, we will need to speak to the business. If you have a data steward network, or have previously identified who the Subject matter experts are within the business, ask them. If you haven't already previously identified the key stakeholders and subject matter experts for business data, this would be a great time to kick start that exercise, and reap instance benefits.
  • Ask the business for their critical data items.
  • Which data items do they need to fulfill their responsibilities?
  • What should these data items look like?
  • Which rules should be applied?
  • When should data be available?

A practical example

The Sales Reporting team provide sales information to the executive team on a weekly basis. Their reporting pack consists of reports such as:
  • Total Volume/Value of Sales
  • Sales Per Store
  • Sales Per Product
  • Top/Bottom Performing Sales Staff

A number of critical data items were identified in order for this reporting pack to be built upon good quality data. Based upon these critical data items we identified a number of rules and expectations that must be met during a data profiling exercise in order for Sales data to be deemed fit for purpose.
  • The sales transactions table should only contain 1 record per 'transaction_id'
  • The sales transactions table should contain data for the previous six months, up to and including the previous working day
  • Each sale must be related to a 'store_id' and 'seller_id' that can be referenced back to our 'stores' and 'sellers' reference data.
  • The 'sales_amount' should never be a negative figure
  • A seller should only ever be attached to one store at a time. If a seller moves to a new store the previous seller/store relationship should be end dated.

If a record adheres to the above rules we can flag it as ‘GREEN’. Meaning that it adheres to business rules and expectations on data performance. Once the recordset has been aggregated we can measure % records that are of ‘Good Quality’ for Sales Reporting and visualise this upon our scorecard.

In Conclusion

The key takeaway point from this section of the scorecard is that we are attempting to benchmark data quality against business expectations. Profiling data quality without taking into account the context in which the data will be used may result in further data quality issues, and misuse of data.

In my next (and final) post of this series we'll go through an example of putting all the sections together into a usable dashboard for the business.

Related Posts

Part 1: Introduction

Part 2: Customer

Part 3: Internal Processes

Part 4: Financial

Part 6: The Dashboard

Thursday, 4 February 2010

A balanced approach to scoring data quality: Part 4

We've determined that the scorecard is all about continuous improvement. We've discussed measuring the customer perception, as well as measuring our performance against our internal processes. Today I wanted to cover the 'Financial' perspective of the scorecard, which will allow us to begin to demonstrate the Return of Investment from Data Quality Management.

Data Quality has often come under the spotlight as being a difficult discipline in which to demonstrate quantifiable financial value. Many people have had issues gaining funding for DQ initiatives because of the purely speculative nature of the ROI. The aim here to discuss ways in which we can go beyond this speculation, and look to financially quantify data quality within an organisation.

Quantify This

Our Internal Processes are often hard to quantify in financial terms. For instance, a large amount of time and effort has been applied to ensure that the business community has a definitive business glossary, containing all the terminology and business rules that they use within their reporting and business processes. This has been published, and highly praised, throughout the organisation.

However, how can we begin to ascertain the true financial benefit of this activity? To do this we would need to interview the whole business community, asking them to cast their collective minds back to the world before the business glossary. Asking them:
  • How long they have spent chasing the correct definition?
  • How many reports they generated with incorrect definition?
  • How much scrap and rework they undertook because of this incorrect definition?
It is generally accepted that an agreed and published business glossary is beneficial to the community, but putting financial benefit against it can be extremely difficult.

Quantify That

Two key metrics I want to delve into are:
  • Known cost of poor data quality
  • Known saving due to DQ Management
Both of these metrics are useful in financially quantifying Data Quality.

Known cost of poor data quality

I like to think of the 'known cost of poor data quality' as a reactive metric. What I mean by this is that the cost of poor data quality can only truly be ascertained after an issue has occurred. If an issue has not yet occurred, the cost can only be pure speculation. As part of our reaction to a data quality issue we should undertake an impact assessment.

This impact assessment will ask, among other things:
  • How long has the issue been a problem?
  • Who has it impacted during that time?
  • What workarounds were undertaken?
The answers to these questions will aid us in ascertaining a known cost of the issue.

A example of the cost

We discussed Business definitions in the previous section, so let's now take an example from another one of our Internal Processes: Data Quality Issue Resolution. The below issue was raised to a newly established Data Quality team by an MI analyst within a financial services organisation:

"We have an Issue with Product Names within our datamart. A large number of records are coming through with incorrect or empty product names, which is causing havoc with my reporting. The product code is correct, but the people I send my reporting pack to won't understand the code. I'm currently bringing the data from the datamart into an Access database, and joining the table to my lookup table that contains all the correct product codes/names. This is taking me about an hour a day because of the amount of data I have to import/export. It's been like this for 4 months but I didn't know who to contact about the issue. Thanks for your help!"

This issue was promptly resolved by ensuring that reference data was updated to reflect true products. But how can we begin to add a financial perspective to this issue?

Using ITJobsWatch, the British IT Jobs market tracker, I noticed that the average salary of an MI Analyst was £32,500. Based upon this salary I estimated the cost of an hour a day each working day for 4 months.

Monthly: £2,708
Weekly: £625 (based on 4.33 weeks in a month)
Daily: £125 (based on 5 working days per week)
Hourly: £16.66 (based on 7.5 hour working day)

4 months at 4.33 weeks = 17.32 weeks
5 days per week for 17.32 weeks = 86.6 days
1 hour a day for 86.6 days at £16.66 per hour = £1,442.75

In this example, the cost was caused by time & resource spending 86.6 hours firefighting instead of partaking in value-adding activity. If the MI Analyst hadn't reported the issue, and continued to firefight for 1 year, they would have spent 259.8 hours, or more than six working weeks over the course of a year firefighting. A scary thought.

Known saving due to DQ Management

'Known saving due to DQ Management' is on the other hand a proactive metric for quantifying financial benefit. It is the measurement of savings garnered due to DQ management efforts to ensure issues are captured prior to become issues.

Caution should be taken to ensure that speculative savings are not mistaken for known savings.

For instance, you could speculate that because a data quality issue relating to customer address details - that would have impacted marketing, billing, customer care and customer complaints - was fixed prior to impacting customer mailouts, you saved the organisation £5,000,000. Not to mention the potential bad publicity and customer churn. What qualifies you to suggest this monetary value?

The best way to ensure that this metric relates to true savings is to ensure that DQ management efforts are closely aligned to business processes.

A example of the saving

For instance, the Data Quality Firewall initiative that I wrote about previously discovered that a UK retail bank were about to overpay their staff by £200,000 in sales incentive payments due to duplicate sales transactions in their processing tables. The initiative resulted in these duplicate records being captured prior to Incentive payments being calculated. Our DQ initiative saved the organisation £200,000 by performing one simple data profiling technique (FACT). Not to mention the savings due to scrap & rework and potential trade union/media involvement that a potential mistake and subsequent clawing back of employees take-home pay could invoke (SPECULATION).

In Conclusion

When looking at measuring Data Quality from a financial perspective it is important to look at it from the perspective of both the 'known cost of poor data quality' and the 'known saving due to DQ Management'.

We know, and accept, that it may not be possible to truly quantify all aspects of data quality management. However, by starting to quantify data quality in terms of costs & savings wherever you can will help to raise the profile of both your data quality management activities, and the need for fit for purpose data within your organisation.

Related Posts

Part 1: Introduction
Part 2: Customer
Part 3: Internal Processes
Part 5: Data Dimensions
Part 6: The Dashboard

Monday, 1 February 2010

A balanced approach to scoring data quality: Part 3

Today I wanted to discuss the ‘Internal Processes’ section of the scorecard. In case you missed the previous parts of the series, you can read the introduction here, and about the ‘customer’ section here.

Perform, Perform, Perform

Throughout the business world people are measured upon their performance. How well do they carry out their responsibilities? Do they hit their objectives? Do they adhere to any applicable SLAs? The Data Quality team should be no different and we should look towards measuring our performance against our internal processes.

Our Internal processes are the procedures and tasks we follow to ensure data quality is managed, and communicated, throughout the business community.

Consider the following Internal Processes:

  • Publishing and Review of a Business Terminology / Data Dictionary
  • Resolution and Communication of DQ issues in a timely manner
  • Identification of appropriate system, data & report ownership

All of the above are critical processes within the day to day responsibilities of a Data Quality team. If we under-perform in delivering any of these processes, it will have a knock-on impact on how data quality management is delivered within an organisation. In some cases, poor performance within our internal processes could even be a contributing factor to poor data quality.

For example, a Product Manager has noticed that sales data for their product is not accurate in the data warehouse. They raised a data quality issue to your team. The data warehouse is also used by the Finance team, and is currently being used to provide financial figures for a last minute board meeting. The data quality issue was raised yesterday, and is currently being investigated, but there has been no communication to the business community to advise them of the issue. The board of directors are now looking at inaccurate data, questioning the figures and wondering whether they can trust the data or not?

How can we measure our Internal Processes?

We can measure the performance of our internal processes by benchmarking them against our objectives, or against targets based upon our objectives. As an example, let’s take the process of ‘Resolution and Communication of DQ issues in a timely manner’.


All known Data Quality issues should be immediately communicated to the business community, and be resolved within 3 days of being raised


DQ Issues raised – 125
DQ Issues resolved within 3 days – 70 (56%)
Issues communicated to community – 100 (80%)

Upon seeing the measures above, we could ask:

“Why were only 56% of DQ issues resolved within our target time period? Do we need to involve more resources to fix issues? Do we need to adjust the target SLA?”


“All issues were due to be communicated to the business community immediately. Why were 25 issues not communicated? Do we need to set up reminders? Was no one able to pick up the issues?”

In Conclusion

As Satesh suggested in a comment to my previous post: “What gets measured improves”. This is exactly what we are trying to achieve from a scorecard. Poor Performance within our Internal Processes could have a knock on effect on the perception of DQ management from our customers. Therefore, a process of continuous measurement, analysis and improvement is required, in order to ensure that we do not get complacent and adopt poor DQ Management habits.

The next post in this series will deal with the ‘Financial’ section of the scorecard, and we’ll look into how we can begin to measure the financial impact that DQ management can have on an organisation.

Related Posts

Part 1: Introduction
Part 2: Customer
Part 4: Financial
Part 5: Data Dimensions
Part 6: The Dashboard