Risk Event and Probability Extraction for Modeling Medical Risks

In this paper we address the task of extracting risk events and probabilities from free text, focusing in particular on the biomedical domain. While our initial motivation is to enable the determination of the parameters of a Bayesian belief network, our approach is not specific to that use case. We are the first to investigate this task as a sequence tagging problem where we label spans of text as events A or B that are then used to construct probability statements of the form P(A|B)=x. We show that our approach significantly outperforms an entity extraction baseline on a new annotated medical risk event corpus. We also explore semi-supervised methods that lead to modest improvement, encouraging further work in this direction.

[1]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[2]  Scott M. Smith,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1989 .

[3]  M Elisabeth Paté-Cornell,et al.  Early technology assessment of new medical devices , 2008, International Journal of Technology Assessment in Health Care.

[4]  Peter J. Haug,et al.  ILIAD as an Expert Consultant to Teach Differential Diagnosis , 1988 .

[5]  Gee Liek Yeo,et al.  Engineering Risk Analysis of a Hospital Oxygen Supply System , 2006, Medical decision making : an international journal of the Society for Medical Decision Making.

[6]  Jari Björne,et al.  TEES 2.1: Automated Annotation Scheme Learning in the BioNLP 2013 Shared Task , 2013, BioNLP@ACL.

[7]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[8]  Alex A. T. Bui,et al.  Evaluation of a Dynamic Bayesian Belief Network to Predict Osteoarthritic Knee Pain Using Data from the Osteoarthritis Initiative , 2008, AMIA.

[9]  D. Roth 1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation , 2007 .

[10]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[11]  Pierre Zweigenbaum,et al.  Medical Entity Recognition: A Comparaison of Semantic and Statistical Methods , 2011, BioNLP@ACL.

[12]  Steven Abney,et al.  Semisupervised Learning for Computational Linguistics , 2007 .

[13]  Martin Theobald,et al.  Extraction of Conditional Probabilities of the Relationships Between Drugs, Diseases, and Genes from PubMed Guided by Relationships in PharmGKB , 2009, Summit on translational bioinformatics.

[14]  G. Octo Barnett,et al.  DXplain: Patterns of Use of a Mature Expert System , 2005, AMIA.

[15]  Silja Renooij,et al.  Probabilities for a probabilistic network: a case study in oesophageal cancer , 2002, Artif. Intell. Medicine.

[16]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[17]  G. Fuller,et al.  Simulconsult: www.simulconsult.com , 2005, Journal of Neurology, Neurosurgery & Psychiatry.

[18]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[19]  Dan Klein,et al.  Optimization, Maxent Models, and Conditional Estimation without Magic , 2003, NAACL.

[20]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[21]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[22]  Massimo Poesio,et al.  Acquiring Bayesian Networks from Text , 2004, LREC.

[23]  Scott Steele,et al.  Using Machine-Learned Bayesian Belief Networks to Predict Perioperative Risk of Clostridium Difficile Infection Following Colon Surgery , 2012, Interactive journal of medical research.

[24]  Sampo Pyysalo,et al.  Overview of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[25]  Chang Wang,et al.  Medical Relation Extraction with Manifold Models , 2014, ACL.

[26]  Dan I. Moldovan,et al.  Text Mining for Causal Relations , 2002, FLAIRS.

[27]  Marcelo Fiszman,et al.  Identifying Risk Factors for Metabolic Syndrome in Biomedical Text , 2007, AMIA.

[28]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[29]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.