I have been involved in a number of different data related projects where the business problem can be summarised in a similar way.
"We have poor quality and understanding of data within our organisation/a key business process, and we require confidence in the data we are producing"
Within these type of projects the approach has been fairly standardised. The first step has been to gather an Understanding of the current data landscape. Once understanding has been achieved the next step is to go about Improving the landscape. Once improvements have been implemented the emphasis is on Controlling the improvement to ensure that the measures implemented remain in place, and a successful outcome of the project is met.
Within this post I want to outline the core components of Understanding the data landscape. Each has equal importance and should be seen as a complementing partner to the next. I would consider an exercise where only one component is achieved to be lacking in the aim of providing a complete understanding of the data landscape.
What are the key components within the understanding phase?
How is data currently being consumed within the organisation? Which data warehouse, report or operational systems are currently being used to provide insight into sales, risk or performance? The idea is to identify both which sources are being used as well as who is using them. The aim of this exercise is to allow a high level picture to be developed of systems that are critical to the provision of data within an organisation, as well as the scope of this provision.
After completing the above exercise we know how data is being used within our organisation, but we may not be currently aware how data on a report is derived. The aim of the Data Mapping exercise is to understand the lifecycle of data, from it's creation at source, all the way to it's appearance on a report. We require to understand how data flows through systems and what happens to the data at each stage of the journey. For instance, how is it received, what transformation is undertaken, and how is it loaded. Is it subject to any standardisation or additional business rules, and is it aggregated at any stage? The results from this process will provide both a graphical and detailed understanding of data as it passes through the organisation, and how it is touched along the way.
The profiling of data will help us to further understand how it is structured, how it adheres to standards and to identify any potential data quality issues which may impact the accuracy of reporting further down the line. The process of Data Profiling is the technical accompaniment to the Data Usage and Data Mapping exercises highlighted above, and is necessary in order to provide a concise picture of the data landscape.
In future blog posts I will cover methods that can be utilised to present this information to business users, as well as methods to prioritise focus areas which will influence and form the next stage of the project: The Improving Stage.