Can structured EHR data support clinical coding? A data mining approach

Structured data formats are gaining momentum in electronic health record systems and can be leveraged for decision support and research. Nevertheless, such structured data formats have not been explored for clinical coding, which is an essential process requiring significant manual workload in health organizations. This article explores the extent to which fully structured clinical data can support the assign- ment of clinical codes to inpatient episodes, through the design and application of a methodology that tackles high dimensionality issues, addresses the multi-label nature of coding and optimizes model parameters. The methodology encompasses transforming database entries to define a feature set and build a data matrix representation, and testing combinations of filter feature selection methods with machine learning models to predict code assignment. The methodology is tested with a real hospital dataset, with results showing varying predictive power across codes but demonstrating the potential of leveraging structuring data to reduce workload and increase efficiency in clinical coding.

[1]  Rema Padman,et al.  Machine Learning Approaches for Early DRG Classification and Resource Allocation , 2015, INFORMS J. Comput..

[2]  Frank D. Wood,et al.  Diagnosis code assignment: models and evaluation metrics , 2013, J. Am. Medical Informatics Assoc..

[3]  B. Yucesoy,et al.  Comparison of semantic and single term similarity measures for clustering turkish documents , 2007, ICMLA 2007.

[4]  Guodong Gao,et al.  Health IT and economics , 2015 .

[5]  Olivier Bodenreider,et al.  From indexing the biomedical literature to coding clinical text: experience with MTI and machine learning approaches , 2007, BioNLP@ACL.

[6]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[7]  Anthony N. Nguyen,et al.  Automatic ICD-10 classification of cancers from free-text death certificates , 2015, Int. J. Medical Informatics.

[8]  Laurent Lecornu,et al.  Information quality measurement of medical encoding support based on usability , 2013, Comput. Methods Programs Biomed..

[9]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[10]  Clement J. McDonald,et al.  Development of the Logical Observation Identifier Names and Codes (LOINC) vocabulary. , 1998, Journal of the American Medical Informatics Association : JAMIA.

[11]  Geoff Holmes,et al.  MEKA: A Multi-label/Multi-target Extension to WEKA , 2016, J. Mach. Learn. Res..

[12]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[13]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[14]  Mohammad Khalilia,et al.  Predicting disease risks from highly imbalanced data using random forest , 2011, BMC Medical Informatics Decis. Mak..

[15]  Kaija Saranto,et al.  Impacts of structuring the electronic health record: A systematic review protocol and results of previous reviews , 2014, Int. J. Medical Informatics.

[16]  Astrid M. van Ginneken,et al.  Bmc Medical Informatics and Decision Making Structured Data Entry for Narrative Data in a Broad Specialty: Patient History and Physical Examination in Pediatrics , 2006 .

[17]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[18]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[19]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[20]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[21]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[23]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[24]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[25]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[27]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[28]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[29]  A. Sheikh,et al.  Benefits and risks of structuring and/or coding the presenting patient history in the electronic health record: systematic review , 2012, BMJ quality & safety.

[30]  R Haux,et al.  Systematic planning of clinical documentation. , 1996, Methods of information in medicine.

[31]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[32]  Yuan Lu,et al.  An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records , 2015, Artif. Intell. Medicine.

[33]  Walter Daelemans,et al.  Selecting relevant features from the electronic health record for clinical code prediction , 2017, J. Biomed. Informatics.

[34]  Jennifer G. Dy,et al.  Medical coding classification by leveraging inter-code relationships , 2010, KDD.

[35]  K. Luyckx,et al.  Data integration of structured and unstructured sources for assigning clinical codes to patient stays , 2015, J. Am. Medical Informatics Assoc..

[36]  Peter D. Stetson,et al.  Model Formulation: An Electronic Health Record Based on Structured Narrative , 2008, J. Am. Medical Informatics Assoc..

[37]  Laetitia Vermeulen-Jourdan,et al.  Synergies between operations research and data mining: The emerging use of multi-objective approaches , 2012, Eur. J. Oper. Res..

[38]  Marcelo Finger,et al.  Automated Classification of Semi-Structured Pathology Reports into ICD-O Using SVM in Portuguese. , 2017, Studies in health technology and informatics.

[39]  Carol Friedman,et al.  Research Paper: Human and Automated Coding of Rehabilitation Discharge Summaries According to the International Classification of Functioning, Disability, and Health , 2006, J. Am. Medical Informatics Assoc..

[40]  C J McDonald,et al.  Computer-stored medical records. Their future role in medical practice. , 1988, JAMA.

[41]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[42]  M. Buntin,et al.  Variation in Electronic Health Record Adoption and Readiness for Meaningful Use: 2008–2011 , 2013, Journal of General Internal Medicine.

[43]  João Miguel da Costa Sousa,et al.  Missing data in medical databases: Impute, delete or classify? , 2013, Artif. Intell. Medicine.

[44]  Marius Fieschi,et al.  Improving the quality of the coding of primary diagnosis in standardized discharge summaries , 2008, Health care management science.

[45]  Koldo Gojenola,et al.  Computer aided classification of diagnostic terms in spanish , 2015, Expert Syst. Appl..

[46]  Yuan Luo,et al.  Clinical text classification with rule-based features and knowledge-guided convolutional neural networks , 2018, 2018 IEEE International Conference on Healthcare Informatics Workshop (ICHI-W).

[47]  B. Fernando,et al.  A review of the empirical evidence of the value of structuring and coding of clinical information within electronic health records for direct patient care. , 2013, Informatics in primary care.

[48]  H P Dinwoodie,et al.  Automatic disease coding: the 'fruit-machine' method in general practice. , 1973, British journal of preventive & social medicine.

[49]  Xiaonan Li,et al.  Operations research and data mining , 2008, Eur. J. Oper. Res..

[50]  Dirk C. Mattfeld,et al.  Synergies of Operations Research and Data Mining , 2010, Eur. J. Oper. Res..

[51]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[52]  Georges Dupret,et al.  Bootstrap re-sampling for unbalanced data in supervised learning , 2001, Eur. J. Oper. Res..

[53]  Catarina Silva,et al.  Decision Support System to Diagnosis and Classification of Epilepsy in Children , 2014, J. Univers. Comput. Sci..

[54]  Danilo Montesi,et al.  ICD Code Retrieval: Novel Approach for Assisted Disease Classification , 2015, DILS.

[55]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[56]  Neil F. Doherty,et al.  Operational research from Taylorism to Terabytes: A research agenda for the analytics age , 2015, Eur. J. Oper. Res..

[57]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[58]  Muge Capan,et al.  Using electronic health records and nursing assessment to redesign clinical early recognition systems , 2017 .

[59]  George Hripcsak,et al.  Automated encoding of clinical documents based on natural language processing. , 2004, Journal of the American Medical Informatics Association : JAMIA.

[60]  A Burgun,et al.  Automated Coding of Patient Discharge Summaries Using Conceptual Graphs , 1995, Methods of Information in Medicine.

[61]  Brenton D. Faber,et al.  Examining the impact of regular physician visits on heart failure patients: a use case with electronic health data , 2016 .

[62]  João Paulo Silva Cunha,et al.  Medical information extraction in European Portuguese , 2013 .

[63]  Wei Ma,et al.  RxNorm: prescription for electronic drug information exchange , 2005, IT Professional.

[64]  Richárd Farkas,et al.  Automatic construction of rule-based ICD-9-CM coding systems , 2008, BMC Bioinformatics.

[65]  Heiko Gewald,et al.  Acceptance and use of electronic medical records: An exploratory study of hospital physicians’ salient beliefs about HIT systems , 2015 .

[66]  Eric W. Ford,et al.  Predicting the adoption of electronic health records by physicians: when will health care be paperless? , 2006, Journal of the American Medical Informatics Association : JAMIA.

[67]  Peter J. Haug,et al.  Development and evaluation of a computerized admission diagnoses encoding system. , 1996, Computers and biomedical research, an international journal.

[68]  Nicolette de Keizer,et al.  Forty years of SNOMED: a literature review , 2008, BMC Medical Informatics Decis. Mak..

[69]  G. Downing,et al.  Liberating Health Data for Clinical Research Applications , 2010, Science Translational Medicine.

[70]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[71]  Chih-Chuan Chen,et al.  Conceptual-driven classification for coding advise in health insurance reimbursement , 2011, Artif. Intell. Medicine.

[72]  Robert A. Jenders,et al.  A systematic literature review of automated clinical coding and classification systems , 2010, J. Am. Medical Informatics Assoc..

[73]  Koby Crammer,et al.  Automatic Code Assignment to Medical Text , 2007, BioNLP@ACL.

[74]  Christopher G. Chute,et al.  Research Paper: Automating the Assignment of Diagnosis Codes to Patient Encounters Using Example-based and Machine Learning Techniques , 2006, J. Am. Medical Informatics Assoc..

[75]  Everton Alvares Cherman,et al.  Incorporating label dependency into the binary relevance framework for multi-label classification , 2012, Expert Syst. Appl..