A Framework to Assess Healthcare Data Quality

1. IntroductionData quality assessment is a fundamental task when undertaking research. A wealth of healthcare data provided from the National Health Service is available and can be easily accessed and utilised for research. Even though health related datasets are obtained from authoritative sources, issues within the quality of data may be apparent. Data quality issues can lead to an array of errors within research findings including incorrect demographical information and exaggeration of disorder prevalence. Moreover, the consequences of decisions made from inaccurate results can be damaging to organisations within the healthcare sector (Goodchild, 1993).It is therefore important to use a framework to assess the quality of data obtained from the data mining process. This will help determine whether it can be used to test hypotheses, and increase confidence of validity.The motivation for this current research was to investigate data quality issues encountered for a research report undertaken by the NHS Coventry and Warwickshire Partnership Trust entitled 'Up Skilling the Adult Mental Health Workforce in Psychological Practice Skills'. As researchers had access to a wealth of data from several sources, it was important to examine the data available to the research and what data quality criteria would be necessary to draw conclusions on its suitability. As many of the available datasets had not been collected with a specific research question, the selection quality and methods were not under control of the research and therefore, were difficult to validate (Sorensen, Sabroe & Olsen, 1996). From this, there was a need to construct a robust framework to assess the quality of data. This led to a review of existing frameworks and the formation of a new framework specific for this research.2. Review of Quality FrameworkThe existing literature instigated that the criteria for a quality framework must be general, applicable across application domains and data types and clearly defined, Price & Shanks, (2004). Eppler (2001), put forward that quality frameworks should show interdependencies between different quality criteria, to allow researchers to become familiar with how data quality issues impact other criteria.The Data Quality Assessment Methods and Tools (DatQAM) provides a systematic implementation of data quality assessment which includes a range of quality measures which considers the strengths of official statistics. It is concerned with user satisfaction concerning relevance, sampling and non-sampling errors, production dates concerning timeliness, availability of metadata and forms for dissemination, changes over time and geographical differences and coherence (Eurostat, 2007).The Quality Assurance Framework (QAF) developed by Statistics Canada (2010) includes a number of quality measures for assessing data quality including measures for timeliness, relevance, interpretability (completeness of metadata), accuracy (coefficient of variance, imputation rates), coherence and accessibility. These two data quality (DQ) frameworks are similar in the way that they consider measures for data quality and for the data quality criteria themselves. They are also widely used, an example of this is that the HSCIC uses the DatQAM for data quality assessments (HSCIC, 2013).In order to build a framework which considers measures for DQ we can consider these two frameworks and how the criteria are measured within them in order to gain a comprehensive framework that can be applied to data that we use within our research. These measures have been adapted from the DatQAM and QAF frameworks in order to quantify our data quality assessments.Furthermore, the World Health Organisation's (WHO) 'quality criteria' was utilised in order to categorise the quality measurements. The Data Quality Audit Tool (DQAT) is utilised by the WHO and Global Fund. After cross referencing it was decided that a 'confidentiality' criteria be added to the framework which was adapted from the DQAT (2008). …