Data Audit

What is a Data Audit?

A data audit is a process of checking over the data that you have to understand the potential within that data set, and what steps need to be taken now to improve the data quality moving forward.

Typically a data audit is done as a precursor to another project, however it is a good starting point of working together and often produces some interesting findings.

Why do one?

Anybody trying to interpret a report or insight should have a good understanding of what is in there, whether you are doing a complex model or segmentation or just providing a report.

It is remarkable what you find when you look closely at your data, here are a few examples of what we have found in the past:

  • A multi-national beauty product company with a retail presence selling promotional cookies– Their best customer was an administration assistant for a local company that brought their promotional cookies (not even a beauty purchase) 220 times in a single year. That is almost everyday! This level of transactions would over inflate the average number of transactions in any of their reports (normally there would only be 2-3 purchases on average).
  • An online furniture company with a highly engaged browser – There most engaged customer visited the site (a visit not a page view) over 2000 times in a 3 month period. They would switch between three devices as they commuted were at work and at home. A handful  of extreme browsers contributed 10% of their overall traffic.
  • 50% of web traffic was generated by a BOT for an asset manager – Some monitoring software can trigger web site tracking tools to trigger. Given they are often set to check the site every minute that would cause 500k new visitors and visits per year. Quite a difference!
  • A pub chain manager doesn’t play fair – A pub loyalty scheme was being abused by a member of staff who would buy & return a bottle of whiskey to then be able to redeem points at a later date for free or cheap beer.
  • An international publisher over-estimated users by sevenfold –  A very well known children’s publisher had an estimated 20m users, however many emails and multiple accounts, those accounts had multiple email addresses. This mess hid the fact that there were actually only about 3 million unique accounts
  • A publisher incorrectly reading usage report – An online publisher was looking at how to measure engagement of people, however the approach to running “ad-hoc” counts was wrong causing the report to double count some of the visits.

A very common area is conversion reconciliation, this is a common one. Have you ever compared the number of “conversions” in a source like Google Analytics to what actually comes through in the back end?

These numbers can vary wildly because:

  • Tags are missing on some pages
  • Tags are over fired
  • If you have a complex transaction, like a stock broker, and its just not appropriate for the basic tracking

As well as finding these nuances, which once identified can then be handled, the audit process helps identify opportunities within the data. Here are a few examples of things that we could look to improve our understanding of the data at hand:

  • Deriving gender from the first name – Our client hadn’t captured much demographic information, by using the gender of the first name we could provide a bit of a color to different customer segments
  • GEO cleansing – manually entered names and addresses are often hard to map, with a few processes we can cleanse these to provide an idea of how we can populate this quickly

What’s included?

The data audit varies depending on the type of data. Typically we try to review:

  1. What is captured – a description of the different types of data we have at hand and how it is stored
  2. Data quality – Are there reasonable values, i.e. how many people are over 100? Can we compare them to another data source to check & validate them?
  3. Population of variables – Is there enough there to populate it
  4. Cleansed vs uncleaned KPIs – If we were to clean it how does the data change
  5. What can we do with the data

The data audit is a useful tool to help show where you currently stand, but there is a balance in thinking how will you use this data.

What does it lead to?

Data audits are used for a variety of data types, typically this could lead into a:

  1. Targeting project, to build predictive models and segments to better communicate to individuals or groups of people
  2. Insight project, using the data to understand your customers and their behavior
  3. Reporting, to ensure that you have a robust & true data set for making decision internally