Imputation of Missing Diagnosis of Diabetes in an Administrative EMR System

Administrative electronic medical records (EMRs) contain rich patient data and are an important data source for health informatics studies. Prevalent in such EMRs, poor/missing diagnosis coding is intractable while can be mitigated by imputation techniques. In this work, based on an administrative EMR database in Singapore, we adopted popular machine learning methods to model the relations between diseases and healthcare utilization features, and used the model to impute missing diagnosis of diabetes. Further, this was partially validated with supplementary clinical data. The structured method in this work can be easily extended to other diseases and would benefit other works in health services and research.

[1]  Michael E. Miller,et al.  Electronic Health Records , 2014, Annals of Internal Medicine.

[2]  V. Jha,et al.  Chronic kidney disease: global dimension and perspectives , 2013, The Lancet.

[3]  A. Hoerbst,et al.  Electronic Health Records , 2010, Methods of Information in Medicine.

[4]  O. Miettinen,et al.  Theoretical Epidemiology: Principles of Occurrence Research in Medicine. , 1987 .

[5]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[6]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[7]  Roderick J A Little,et al.  A Review of Hot Deck Imputation for Survey Non‐response , 2010, International statistical review = Revue internationale de statistique.

[8]  Fernanda Gusmão de Lima Kastensmidt,et al.  Evaluating one-hot encoding finite state machines for SEU reliability in SRAM-based FPGAs , 2006, 12th IEEE International On-Line Testing Symposium (IOLTS'06).

[9]  Robin C. Meili,et al.  Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. , 2005, Health affairs.

[10]  Daniel Schunk,et al.  A Markov chain Monte Carlo algorithm for multiple imputation in large surveys , 2008 .

[11]  Guang-Bin Huang,et al.  Classification ability of single hidden layer feedforward neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[12]  R. Perera Research methods journal club: a gentle introduction to imputation of missing values , 2008, Evidence-based medicine.

[13]  M J Shipley,et al.  Intermittent claudication, heart disease risk factors, and mortality. The Whitehall Study. , 1990, Circulation.

[14]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[15]  B.J. Oommen,et al.  Pattern recognition of strings with substitutions, insertions, deletions and generalized transpositions , 1997, Pattern Recognit..