A Model for Setting Optimal Data-Acquisition Policy and its Application with Clinical Data

Manual data acquisition is often subject to incompleteness – data attributes that are missing due to time and data-availability constraints, which might damage data usability for analyses and decision making. This study introduces a novel optimization model for setting mandatory versus voluntary attributes in a dataset. This model may direct the decision of whether or not to enforce the acquisition of certain attributes, given certain constraints and dependencies. The feasibility and the potential contribution of the proposed model were evaluated with a clinical dataset that reflects Colonoscopy procedures performed in a large hospital over a 4-year period. The evaluation demonstrated that the model can be reasonably estimated within the given context, and that its implementation may contribute important insight toward improving data quality. The current data-acquisition setup was shown to be suboptimal, and some further evaluation identified factors that influence incompleteness and may require revisions to current data acquisition policies.

[1]  Subhash Bhalla,et al.  Semantic interoperability in standardized electronic health record databases , 2012, JDIQ.

[2]  Robert Fletcher,et al.  Standardized colonoscopy reporting and data system: report of the Quality Assurance Task Group of the National Colorectal Cancer Roundtable. , 2007, Gastrointestinal endoscopy.

[3]  Thomas Redman,et al.  Data quality for the information age , 1996 .

[4]  Michael M. Wagner,et al.  Review: Accuracy of Data in Computer-based Patient Records , 1997, J. Am. Medical Informatics Assoc..

[5]  C. Sherbourne,et al.  The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. , 1994 .

[6]  Peter B Cotton,et al.  Colonoscopy: practice variation among 69 hospital-based endoscopists. , 2003, Gastrointestinal endoscopy.

[7]  Lars Aabakken,et al.  Mechanics of quality assurance - now and in the future. , 2011, Best practice & research. Clinical gastroenterology.

[8]  Diane M. Strong,et al.  Knowing-Why About Data Processes and Data Quality , 2004 .

[9]  J. Dekker,et al.  Clinical databases in physical therapy , 2007, Physiotherapy theory and practice.

[10]  Matthias Egger,et al.  Electronic medical record systems, data quality and loss to follow-up: survey of antiretroviral therapy programmes in resource-limited settings. , 2008, Bulletin of the World Health Organization.

[11]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[12]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[13]  Wei-Fong Kao,et al.  The feasibility of full computerization in the ED. , 2002, The American journal of emergency medicine.

[14]  Adir Even,et al.  Utility-driven assessment of data quality , 2007, DATB.

[15]  Adir Even,et al.  Dual Assessment of Data Quality in Customer Databases , 2009, JDIQ.

[16]  J. Romagnuolo,et al.  Effect of simply recording colonoscopy withdrawal time on polyp and adenoma detection rates. , 2010, Gastrointestinal endoscopy.

[17]  Rafael Capilla,et al.  Modeling and Documenting the Evolution of Architectural Design Decisions , 2007, Second Workshop on Sharing and Reusing Architectural Knowledge - Architecture, Rationale, and Design Intent (SHARK/ADI'07: ICSE Workshops 2007).

[18]  Martin Dugas,et al.  Concept and implementation of a computer-based reminder system to increase completeness in clinical documentation , 2011, Int. J. Medical Informatics.

[19]  Harvey Ellis,et al.  Endoscopy , 1919, The American Journal of Gastroenterology.

[20]  A. Majeed Sources, uses, strengths and limitations of data collected in primary care in England. , 2004, Health statistics quarterly.

[21]  Adir Even,et al.  Evaluating a model for cost-effective data quality management in a real-world CRM setting , 2010, Decis. Support Syst..

[22]  M. G. Martínez,et al.  [Perforation after colonoscopy: our 16-year experience]. , 2007 .

[23]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[24]  J Crook,et al.  Capturing tumour stage in a cancer information database. , 1998, Cancer prevention & control : CPC = Prevention & controle en cancerologie : PCC.

[25]  Kaija Saranto,et al.  Definition, structure, content, use and impacts of electronic health records: A review of the research literature , 2008, Int. J. Medical Informatics.

[26]  P McCulloch,et al.  Completeness of data entry in three cancer surgery databases. , 2002, European journal of surgical oncology : the journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology.

[27]  MSc Gavin C. Harewood MD,et al.  Relationship of Colonoscopy Completion Rates and Endoscopist Features , 2005, Digestive Diseases and Sciences.

[28]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[29]  A. Majeed,et al.  Identifying undiagnosed diabetes: cross-sectional survey of 3.6 million patients' electronic records. , 2008, The British journal of general practice : the journal of the Royal College of General Practitioners.

[30]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[31]  Hyun Young Kim,et al.  Development and evaluation of data entry templates based on the entity-attribute-value model for clinical decision support of pressure ulcer wound management , 2012, Int. J. Medical Informatics.