Thursday, 29 December 2011

Understanding, Improving & Controlling the Data Landscape - Part 1

I have been involved in a number of different data related projects where the business problem can be summarised in a similar way.

"We have poor quality and understanding of data within our organisation/a key business process, and we require confidence in the data we are producing"

Within these type of projects the approach has been fairly standardised. The first step has been to gather an Understanding of the current data landscape. Once understanding has been achieved the next step is to go about Improving the landscape. Once improvements have been implemented the emphasis is on Controlling the improvement to ensure that the measures implemented remain in place, and a successful outcome of the project is met.

Within this post I want to outline the core components of Understanding the data landscape. Each has equal importance and should be seen as a complementing partner to the next. I would consider an exercise where only one component is achieved to be lacking in the aim of providing a complete understanding of the data landscape.

What are the key components within the understanding phase?

Data Usage

How is data currently being consumed within the organisation? Which data warehouse, report or operational systems are currently being used to provide insight into sales, risk or performance? The idea is to identify both which sources are being used as well as who is using them. The aim of this exercise is to allow a high level picture to be developed of systems that are critical to the provision of data within an organisation, as well as the scope of this provision.

Data Mapping

After completing the above exercise we know how data is being used within our organisation, but we may not be currently aware how data on a report is derived. The aim of the Data Mapping exercise is to understand the lifecycle of data, from it's creation at source, all the way to it's appearance on a report. We require to understand how data flows through systems and what happens to the data at each stage of the journey. For instance, how is it received, what transformation is undertaken, and how is it loaded. Is it subject to any standardisation or additional business rules, and is it aggregated at any stage? The results from this process will provide both a graphical and detailed understanding of data as it passes through the organisation, and how it is touched along the way.

Data Profiling

The profiling of data will help us to further understand how it is structured, how it adheres to standards and to identify any potential data quality issues which may impact the accuracy of reporting further down the line. The process of Data Profiling is the technical accompaniment to the Data Usage and Data Mapping exercises highlighted above, and is necessary in order to provide a concise picture of the data landscape.

In future blog posts I will cover methods that can be utilised to present this information to business users, as well as methods to prioritise focus areas which will influence and form the next stage of the project: The Improving Stage.

Wednesday, 5 October 2011

Getting the most out of your audience

Forums, whether called a 'Data Governance Forum', 'Data Stakeholder Forum' or 'Data Quality Forum' are generally a good method to aid communication between your Data Quality Function and the wider business community. Key data stakeholders from throughout an organisation would periodically meet to discuss challenges, issues, roadmaps and strategic data alignment.

How to conduct a forum and what should be the scope of these meetings are in themselves items of great debate, however, here are three simple suggestions to improve the chances of success:

  • Don't use the forum to dictate your vision. The purpose of the forum should be on discussion, with the aim of co-operation to enable you to realise your vision as a Data Quality leader. If you walk into your first meeting stating that the business are accountable for the poor quality data in the organisation, and they need to start taking ownership and responsibility to improve it, chances are your next forum will be sparsely attended. Instead of dictating strategic direction learn to understand the business challenges and the history of data use within the organisation. You are not going to realise your vision without the support of the wider community.

  • Don't blind your audience. If your audience (primarily the business) don't understand you, they will feel confused and distance themselves from your forum. Know your audience, identify and understand their needs, paraphrasing in a language they understand. If you start talking about Informatica Mappings, Data Types and Levenshtein Distance, chances are the business attendees will see this as a technology forum and neglect to attend your next meeting.

  • Use the forum as a means of gaining feedback. What's going well? What isn't going well? What could we do better? The idea is to receive feedback and constructive criticism from your stakeholders in order to improve the service that you provide. Often it is only when the business have confidence in your service and abilities to support them in business as usual activities that they will buy in to your strategic data management initiatives.

Wednesday, 28 September 2011

Assessing the Plot

If you were ever to undertake the task of building a house on an empty plot of land there are a number of factors that you would want to consider.

Before you even start to lay the important foundations you would want to undertake a survey of the Ground Quality. This would typical involve a bore hole analysis to assess the type and quality of the ground. Alongside the bore hole analysis you'd want to analyse the landscape of the plot. For example, are there tree roots close to the planned structure that could impact foundations? Does the plot have sufficient access to required services such as water and sewerage?

In the world of construction, if you want to ensure that you are building something which will be successful, have longevity and contain no hidden surprises you would ensure that the ground work is undertaken in advance.

In Data Migrations, CRM Implementations and Data Warehouse projects you often read assumptions in project documents such as 'the data is assumed to be fit for purpose and adhering to the relevant business standards'. How often is this assumption found to be incorrect, leading to delayed project completion or poor user adoption?

An equivalent of the bore hole analysis would be a data profiling exercise that ascertained what the key data items were, what good quality data looks like and how data performs against these expectations.

An equivalent to a landscape analysis would be to ensure that the system architecture can support both current and future demands, that the right people are in place, and that any required change could be easily undertaken.

These items are key components to a successful implementation but all too often they do not get the time that they deserve on the project plan.

Why are we not taking the opportunity during these transformational projects to question the data and architecture that is relied upon to make the project successful? Why do we all too often assume?