A corpus-based approach for automated LOINC mapping

OBJECTIVE To determine whether the knowledge contained in a rich corpus of local terms mapped to LOINC (Logical Observation Identifiers Names and Codes) could be leveraged to help map local terms from other institutions. METHODS We developed two models to test our hypothesis. The first based on supervised machine learning was created using Apache's OpenNLP Maxent and the second based on information retrieval was created using Apache's Lucene. The models were validated by a random subsampling method that was repeated 20 times and that used 80/20 splits for training and testing, respectively. We also evaluated the performance of these models on all laboratory terms from three test institutions. RESULTS For the 20 iterations used for validation of our 80/20 splits Maxent and Lucene ranked the correct LOINC code first for between 70.5% and 71.4% and between 63.7% and 65.0% of local terms, respectively. For all laboratory terms from the three test institutions Maxent ranked the correct LOINC code first for between 73.5% and 84.6% (mean 78.9%) of local terms, whereas Lucene's performance was between 66.5% and 76.6% (mean 71.9%). Using a cut-off score of 0.46 Maxent always ranked the correct LOINC code first for over 57% of local terms. CONCLUSIONS This study showed that a rich corpus of local terms mapped to LOINC contains collective knowledge that can help map terms from other institutions. Using freely available software tools, we developed a data-driven automated approach that operates on term descriptions from existing mappings in the corpus. Accurate and efficient automated mapping methods can help to accelerate adoption of vocabulary standards and promote widespread health information exchange.

[1]  Mohit Singh,et al.  Office of Electricity Delivery and Energy Reliability (OE) National Energy Technology Laboratory (NETL) American Recovery and Reinvestment Act 2009 United States Department of Energy , 2014 .

[2]  Daniel J. Vreeman,et al.  Enabling international adoption of LOINC through translation , 2012, J. Biomed. Informatics.

[3]  Ming-Chin Lin,et al.  Investigating the semantic interoperability of laboratory data exchanged using LOINC codes in three large institutions. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[4]  Clement J. McDonald,et al.  Standardizing clinical laboratory data for secondary use , 2012, J. Biomed. Informatics.

[5]  Stanley M. Huff,et al.  Research Paper: Automated Mapping of Observation Codes Using Extensional Definitions , 2000, J. Am. Medical Informatics Assoc..

[6]  Olivier Bodenreider,et al.  An approximate matching method for clinical drug names. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[7]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[8]  G Salton,et al.  Global Text Matching for Information Retrieval , 1991, Science.

[9]  Hyeon-Eui Kim,et al.  An approach to improve LOINC mapping through augmentation of local test names , 2012, J. Biomed. Informatics.

[10]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[11]  Kate Johnson,et al.  A method for the automated mapping of laboratory results to LOINC , 2000, AMIA.

[12]  Dan Klein,et al.  Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon , 2005 .

[13]  James J. Cimino,et al.  Combining laboratory data sets from multiple institutions using the logical observation identifier names and codes (LOINC) , 1998, Int. J. Medical Informatics.

[14]  Daniel J. Vreeman,et al.  A Comparison of Intelligent Mapper and Document Similarity Scores for Mapping Local Radiology Terms to LOINC , 2006, AMIA.

[15]  C. McDonald,et al.  LOINC, a universal standard for identifying laboratory observations: a 5-year update. , 2003, Clinical chemistry.

[16]  Otis Gospodnetic,et al.  Lucene in Action , 2004 .

[17]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[18]  Sylvia Thun,et al.  Case Report: LOINC® Codes for Hospital Information Systems Documents: A Case Study , 2009, J. Am. Medical Informatics Assoc..

[19]  Daniel J. Vreeman,et al.  Automated Mapping of Local Radiology Terms to LOINC , 2005, AMIA.

[20]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[21]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[22]  P. Shekelle,et al.  Systematic Review: Impact of Health Information Technology on Quality, Efficiency, and Costs of Medical Care , 2006, Annals of Internal Medicine.

[23]  Health information technology: initial set of standards, implementation specifications, and certification criteria for electronic health record technology. Final rule. , 2010, Federal register.

[24]  Daniel J. Vreeman,et al.  Auditing consistency and usefulness of LOINC use among three large institutions - Using version spaces for grouping LOINC codes , 2012, J. Biomed. Informatics.

[25]  Lonnie Blevins,et al.  The Indiana network for patient care: a working local health information infrastructure. An example of a working infrastructure collaboration that links data from five health systems and hundreds of millions of entries. , 2005, Health affairs.

[26]  Ming-Chin Lin,et al.  Correctness of Voluntary LOINC Mapping for Laboratory Tests in Three Large Institutions. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[27]  Jennifer Y. Sun,et al.  A system for automated lexical mapping. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[28]  Hans-Ulrich Prokosch,et al.  Mapping local laboratory interface terms to LOINC at a German university hospital using RELMA V.5: a semi-automated approach , 2013, J. Am. Medical Informatics Assoc..

[29]  J. Westfall,et al.  Missing clinical information during primary care visits. , 2005, JAMA.

[30]  J Marc Overhage,et al.  All health care is not local: an evaluation of the distribution of Emergency Department care delivered in Indiana. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[31]  Catherine Moore,et al.  Case Report: Standardizing Laboratory Data by Mapping to LOINC , 2006, J. Am. Medical Informatics Assoc..

[32]  L BergerAdam,et al.  A maximum entropy approach to natural language processing , 1996 .

[33]  Daniel J. Vreeman,et al.  Embracing Change in a Health Information Exchange , 2008, AMIA.

[34]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.