Negation recognition in medical narrative reports

Substantial medical data, such as discharge summaries and operative reports are stored in electronic textual form. Databases containing free-text clinical narratives reports often need to be retrieved to find relevant information for clinical and research purposes. The context of negation, a negative finding, is of special importance, since many of the most frequently described findings are such. When searching free-text narratives for patients with a certain medical condition, if negation is not taken into account, many of the documents retrieved will be irrelevant. Hence, negation is a major source of poor precision in medical information retrieval systems. Previous research has shown that negated findings may be difficult to identify if the words implying negations (negation signals) are more than a few words away from them. We present a new pattern learning method for automatic identification of negative context in clinical narratives reports. We compare the new algorithm to previous methods proposed for the same task, and show its advantages: accuracy improvement compared to other machine learning methods, and much faster than manual knowledge engineering techniques with matching accuracy. The new algorithm can be applied also to further context identification and information extraction tasks.

[1]  Gary M. Weiss Mining with Rare Cases , 2010, Data Mining and Knowledge Discovery Handbook.

[2]  Stan Matwin,et al.  A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization , 2001 .

[3]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[4]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[5]  Lior Rokach,et al.  Information Retrieval System for Medical Narrative Reports , 2004, FQAS.

[6]  Cynthia Brandt,et al.  Research Paper: UMLS Concept Indexing for Production Databases: A Feasibility Study , 2001, J. Am. Medical Informatics Assoc..

[7]  Peter J. Haug,et al.  Research Paper: Automatic Detection of Acute Bacterial Pneumonia from Chest X-ray Reports , 2000, J. Am. Medical Informatics Assoc..

[8]  Shi Bing,et al.  Inductive learning algorithms and representations for text categorization , 2006 .

[9]  Lior Rokach,et al.  Context-Sensitive Medical Information Retrieval , 2004, MedInfo.

[10]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Eugene W. Myers,et al.  AnO(ND) difference algorithm and its variations , 1986, Algorithmica.

[14]  W. Bruce Croft,et al.  Research Paper: Ad Hoc Classification of Radiology Reports , 1999, J. Am. Medical Informatics Assoc..

[15]  Tony G. Rose,et al.  Extracting Conceptual Knowledge From Text Using Explicit Relation Markers , 1996, EKAW.

[16]  G. Tottie Negation in English speech and writing : a study in variation , 1993 .

[17]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[18]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[19]  Vipin Kumar,et al.  Predicting rare classes: can boosting make any weak learner strong? , 2002, KDD.

[20]  Petra Perner,et al.  Improving the accuracy of decision tree induction by feature preselection , 2001, Appl. Artif. Intell..

[21]  Peter L. Elkin,et al.  UMLS Concept Indexing for Production Databases: A Feasibility Study , 2001, J. Am. Medical Informatics Assoc..

[22]  Prakash M. Nadkarni,et al.  Research Paper: Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents: A Quantitative Study Using the UMLS , 2001, J. Am. Medical Informatics Assoc..

[23]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[24]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[25]  Andrea Esuli,et al.  Determining the semantic orientation of terms through gloss classification , 2005, CIKM '05.

[26]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[27]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[28]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[29]  A. W. Pratt Medicine, Computers, and Linguistics , 1973 .

[30]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[31]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[32]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[33]  A. McCray,et al.  Yearbook of Medical Informatics , 2013, Yearbook of Medical Informatics.

[34]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[35]  George Hripcsak,et al.  Automated encoding of clinical documents based on natural language processing. , 2004, Journal of the American Medical Informatics Association : JAMIA.

[36]  Wendy W. Chapman,et al.  Identifying Respiratory Findings in Emergency Department Reports for Biosurveillance using MetaMap , 2004, MedInfo.

[37]  Peter J. Haug,et al.  Using medical language processing to support real-time evaluation of pneumonia guidelines , 2000, AMIA.

[38]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[39]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[40]  G DietterichThomas Approximate statistical tests for comparing supervised classification learning algorithms , 1998 .

[41]  Yuan-Fang Wang,et al.  The use of bigrams to enhance text categorization , 2002, Inf. Process. Manag..

[42]  G Hripcsak,et al.  Evaluating Natural Language Processors in the Clinical Domain , 1998, Methods of Information in Medicine.

[43]  Dayne Freitag,et al.  Toward General-Purpose Learning for Information Extraction , 1998, ACL.

[44]  Daniel I Rosenthal,et al.  Automated computer-assisted categorization of radiology reports. , 2005, AJR. American journal of roentgenology.

[45]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[46]  Roni Rosenfeld,et al.  Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[47]  Thomas C. Rindflesch,et al.  MedPost: a part-of-speech tagger for bioMedical text , 2004, Bioinform..

[48]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[49]  Laurence R. Horn A Natural History of Negation , 1989 .

[50]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[51]  Hagit Shatkay,et al.  Mining the Biomedical Literature in the Genomic Era: An Overview , 2003, J. Comput. Biol..

[52]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[53]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[54]  TanChade-Meng,et al.  The use of bigrams to enhance text categorization , 2002 .

[55]  Trevor Cohn,et al.  Scaling conditional random fields for natural language processing , 2007 .

[56]  Andrew McCallum,et al.  Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.

[57]  Akshay Java A Framework for Modeling Influence, Opinions and Structure in Social Media , 2007, AAAI.

[58]  Heather Mateyak,et al.  Negation of Noun Phrases with not , 1997 .

[59]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[60]  Oded Maimon,et al.  Dimension Reduction and Feature Selection , 2010, Data Mining and Knowledge Discovery Handbook.

[61]  Craig A. Knoblock,et al.  Hierarchical Wrapper Induction for Semistructured Information Sources , 2004, Autonomous Agents and Multi-Agent Systems.

[62]  Hsinchun Chen,et al.  A shallow parser based on closed-class words to capture relations in biomedical text , 2003, J. Biomed. Informatics.

[63]  Jong C. Park,et al.  Extracting contrastive information from negation patterns in biomedical literature , 2006, TALIP.

[64]  P M Nadkarni Information retrieval in medicine: overview and applications. , 2000, Journal of postgraduate medicine.

[65]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[66]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[67]  Andrea Esuli,et al.  Determining the semantic orientation of terms through gloss analysis , 2005, CIKM 2005.

[68]  R. Bekkerman,et al.  Using Bigrams in Text Categorization , 2003 .

[69]  William R. Hersh,et al.  Information Retrieval in Medicine: The SAPHIRE Experience , 1995, J. Am. Soc. Inf. Sci..

[70]  Walter Daelemans,et al.  Improving Accuracy in word class tagging through the Combination of Machine Learning Systems , 2001, CL.

[71]  George Hripcsak,et al.  A Health Information Network for Managing Innercity Tuberculosis: Bridging Clinical Care, Public Health, and Home Care, , 1999, Comput. Biomed. Res..

[72]  Dayne Freitag,et al.  Boosted Wrapper Induction , 2000, AAAI/IAAI.

[73]  M Damashek,et al.  Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[74]  Ian Witten,et al.  Data Mining , 2000 .

[75]  Ilya M. Goldin,et al.  Learning to Detect Negation with ‘Not’ in Medical Texts , 2003 .

[76]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[77]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[78]  Vipin Kumar,et al.  Evaluating boosting algorithms to classify rare classes: comparison and improvements , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[79]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[80]  KimJung-Jae,et al.  Extracting contrastive information from negation patterns in biomedical literature , 2006 .

[81]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.