A context-blocks model for identifying clinical relationships in patient records

BackgroundPatient records contain valuable information regarding explanation of diagnosis, progression of disease, prescription and/or effectiveness of treatment, and more. Automatic recognition of clinically important concepts and the identification of relationships between those concepts in patient records are preliminary steps for many important applications in medical informatics, ranging from quality of care to hypothesis generation.MethodsIn this work we describe an approach that facilitates the automatic recognition of eight relationships defined between medical problems, treatments and tests. Unlike the traditional bag-of-words representation, in this work, we represent a relationship with a scheme of five distinct context-blocks determined by the position of concepts in the text. As a preliminary step to relationship recognition, and in order to provide an end-to-end system, we also addressed the automatic extraction of medical problems, treatments and tests. Our approach combined the outcome of a statistical model for concept recognition and simple natural language processing features in a conditional random fields model. A set of 826 patient records from the 4th i2b2 challenge was used for training and evaluating the system.ResultsResults show that our concept recognition system achieved an F-measure of 0.870 for exact span concept detection. Moreover the context-block representation of relationships was more successful (F-Measure = 0.775) at identifying relationships than bag-of-words (F-Measure = 0.402). Most importantly, the performance of the end-to-end system of relationship extraction using automatically extracted concepts (F-Measure = 0.704) was comparable to that obtained using manually annotated concepts (F-Measure = 0.711), and their difference was not statistically significant.ConclusionsWe extracted important clinical relationships from text in an automated manner, starting with concept recognition, and ending with relationship identification. The advantage of the context-blocks representation scheme was the correct management of word position information, which may be critical in identifying certain relationships. Our results may serve as benchmark for comparison to other systems developed on i2b2 challenge data. Finally, our system may serve as a preliminary step for other discovery tasks in medical informatics.

[1]  S. Meystre,et al.  Automatic de-identification of textual documents in the electronic health record: a review of recent research , 2010, BMC medical research methodology.

[2]  Halil Kilicoglu,et al.  Medical Facts to Support Inferencing in Natural Language Processing , 2005, AMIA.

[3]  Xiaoyan Wang,et al.  Automated Knowledge Acquisition from Clinical Narrative Reports , 2008, AMIA.

[4]  Zhiyong Lu,et al.  A Textual Representation Scheme for Identifying Clinical Relationships in Patient Records , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[5]  Thomas C. Rindflesch,et al.  MedPost: a part-of-speech tagger for bioMedical text , 2004, Bioinform..

[6]  Lorraine K. Tanabe,et al.  A Priority Model for Named Entities , 2006, BioNLP@NAACL-HLT.

[7]  Christopher G. Chute,et al.  Maximum entropy modeling for mining patient medication status from free text , 2002, AMIA.

[8]  Paolo Rosso,et al.  Conditional Random Fields vs. Hidden Markov Models in a biomedical Named Entity Recognition task , 2007 .

[9]  A Valencia,et al.  An Overview of BioCreative II.5 , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[11]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[12]  Xiaoyan Wang,et al.  Selecting information in electronic health records for knowledge acquisition , 2010, J. Biomed. Informatics.

[13]  George Hripcsak,et al.  Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[14]  Mark Craven,et al.  Learning to Extract Relations from MEDLINE , 1999 .

[15]  Lynette Hirschman,et al.  The MITRE Identification Scrubber Toolkit: Design, training, and assessment , 2010, Int. J. Medical Informatics.

[16]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[17]  Hagit Shatkay,et al.  Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users , 2008, Bioinform..

[18]  Zhiyong Lu,et al.  Click-words: learning to predict document keywords from a user perspective , 2010, Bioinform..

[19]  Zhiyong Lu,et al.  Exploring Two Biomedical Text Genres for Disease Recognition , 2009, BioNLP@HLT-NAACL.

[20]  Carol Friedman,et al.  Mining electronic health records for adverse drug effects using regression based methods , 2010, IHI.

[21]  BMC Bioinformatics , 2005 .

[22]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[23]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[24]  Jari Björne,et al.  Complex event extraction at PubMed scale , 2010, Bioinform..

[25]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[26]  A. Valencia,et al.  Overview of the protein-protein interaction annotation extraction task of BioCreative II , 2008, Genome Biology.

[27]  Hans-Peter Kriegel,et al.  Extraction of semantic biomedical relations from text using conditional random fields , 2008, BMC Bioinformatics.

[28]  George Hripcsak,et al.  Integrating heterogeneous knowledge sources to acquire executable drug-related knowledge. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[29]  Angus Roberts,et al.  Mining clinical relationships from patient narratives , 2008, BMC Bioinformatics.

[30]  Özlem Uzuner,et al.  Semantic relations for problem-oriented medical records , 2010, Artif. Intell. Medicine.