Interactive Cohort Identification of Sleep Disorder Patients Using Natural Language Processing and i2b2

UNLABELLED Nationwide Children's Hospital established an i2b2 (Informatics for Integrating Biology & the Bedside) application for sleep disorder cohort identification. Discrete data were gleaned from semistructured sleep study reports. The system showed to work more efficiently than the traditional manual chart review method, and it also enabled searching capabilities that were previously not possible. OBJECTIVE We report on the development and implementation of the sleep disorder i2b2 cohort identification system using natural language processing of semi-structured documents. METHODS We developed a natural language processing approach to automatically parse concepts and their values from semi-structured sleep study documents. Two parsers were developed: a regular expression parser for extracting numeric concepts and a NLP based tree parser for extracting textual concepts. Concepts were further organized into i2b2 ontologies based on document structures and in-domain knowledge. RESULTS 26,550 concepts were extracted with 99% being textual concepts. 1.01 million facts were extracted from sleep study documents such as demographic information, sleep study lab results, medications, procedures, diagnoses, among others. The average accuracy of terminology parsing was over 83% when comparing against those by experts. The system is capable of capturing both standard and non-standard terminologies. The time for cohort identification has been reduced significantly from a few weeks to a few seconds. CONCLUSION Natural language processing was shown to be powerful for quickly converting large amount of semi-structured or unstructured clinical data into discrete concepts, which in combination of intuitive domain specific ontologies, allows fast and effective interactive cohort identification through the i2b2 platform for research and clinical use.

[1]  Hua Xu,et al.  Clinical entity recognition using structural support vector machines with rich features , 2012, DTMBIO '12.

[2]  Robert Moser,et al.  C-B4-02: Enhancing the Quality and Efficiency of Obstructive Sleep Apnea Screening Using Health Information Technology: Results of a Geisinger Clinic Pilot Study , 2011, Clinical Medicine & Research.

[3]  Cui Tao,et al.  Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis , 2012, J. Am. Medical Informatics Assoc..

[4]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[5]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[6]  Shuying Shen,et al.  Automated extraction of ejection fraction for quality measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure , 2012, J. Am. Medical Informatics Assoc..

[7]  Jaewon Oh,et al.  Comparison of pooled cohort risk equations and Framingham risk score for metabolic syndrome in a Korean community-based population. , 2014, International journal of cardiology.

[8]  Wei Chen Context-based Natural Language Processing for GIS-based Vague Region Visualization , 2014, LTCSS@ACL.

[9]  Warren A Kibbe,et al.  Mining biomedical data using MetaMap Transfer (MMtx) and the Unified Medical Language System (UMLS). , 2007, Methods in molecular biology.

[10]  Hua Xu,et al.  Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features , 2013, BMC Medical Informatics and Decision Making.

[11]  Hongfang Liu,et al.  Using large clinical corpora for query expansion in text-based cohort identification , 2014, J. Biomed. Informatics.

[12]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[13]  Wei Chen,et al.  A Synergistic Framework for Geographic Question Answering , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[14]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[15]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[17]  Heiko Spallek,et al.  Using Natural Language Processing to Enable In-depth Analysis of Clinical Messages Posted to an Internet Mailing List: A Feasibility Study , 2011, Journal of medical Internet research.

[18]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[19]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[20]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[21]  Hua Xu,et al.  A comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries , 2012, AMIA.

[22]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[23]  M Maclure,et al.  Case–crossover and case–time–control designs as alternatives in pharmacoepidemiologic research , 1997, Pharmacoepidemiology and drug safety.

[24]  K. E. Ravikumar,et al.  Automated chart review for asthma cohort identification using natural language processing: an exploratory study. , 2013, Annals of allergy, asthma & immunology : official publication of the American College of Allergy, Asthma, & Immunology.

[25]  Raanan Arens,et al.  A randomized trial of adenotonsillectomy for childhood sleep apnea. , 2013, The New England journal of medicine.

[26]  Hua Xu,et al.  A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries , 2011, J. Am. Medical Informatics Assoc..

[27]  Son Doan,et al.  Natural Language Processing in Biomedicine: A Unified System Architecture Overview , 2014, Methods in molecular biology.

[28]  Mirjam Christ-Crain,et al.  Procalcitonin levels predict bacteremia in patients with community-acquired pneumonia: a prospective cohort trial. , 2010, Chest.

[29]  S Wacholder,et al.  Practical considerations in choosing between the case-cohort and nested case-control designs. , 1991, Epidemiology.

[30]  D. Sessler,et al.  Prediction of optimal endotracheal tube cuff volume from tracheal diameter and from patient height and age: a prospective cohort trial , 2012, Journal of Anesthesia.

[31]  The Swiss,et al.  Cohort Profile: The Swiss HIV Cohort Study , 2010 .

[32]  G R Howe,et al.  Evaluation of a self-administered dietary questionnaire for use in a cohort study. , 1982, The American journal of clinical nutrition.

[33]  Adam Wilcox,et al.  Mission and Sustainability of Informatics for Integrating Biology and the Bedside (i2b2) , 2014, EGEMS.

[34]  Henrik Toft Sørensen,et al.  The Danish National Birth Cohort - its background, structure and aim , 2001, Scandinavian journal of public health.

[35]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[36]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[37]  William W. Cohen,et al.  Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods , 2004, KDD.

[38]  Keith Marsolo,et al.  An i2b2-based, generalizable, open source, self-scaling chronic disease registry , 2012, J. Am. Medical Informatics Assoc..

[39]  Stephen R Lord,et al.  Neuropsychological, balance, and mobility risk factors for falls in people with multiple sclerosis: a prospective cohort study. , 2014, Archives of physical medicine and rehabilitation.

[40]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[41]  Amalio Telenti,et al.  Cohort profile: the Swiss HIV Cohort study. , 2010, International journal of epidemiology.

[42]  Guo-Qiang Zhang,et al.  Merging Ontology Navigation with Query Construction for Web-based Medicare Data Exploration , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[43]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[44]  Frank Krummenauer,et al.  Determination of valid benchmarks for outcome indicators in cataract surgery: a multicenter, prospective cohort trial. , 2011, Ophthalmology.