CausalTriad: Toward Pseudo Causal Relation Discovery and Hypotheses Generation from Medical Text Data

Deriving pseudo causal relations from medical text data lies at the heart of medical literature mining. Existing studies have utilized extraction models to find pseudo causal relation from single sentences, while the knowledge created by causation transitivity - often spanning multiple sentences - has not been considered. Furthermore, we observe that many pseudo causal relations follow the rule of causation transitivity, which makes it possible to discover unseen casual relations and generate new causal relation hypotheses. In this paper, we address these two issues by proposing a factor graph model to incorporate three clues to discover causation expressions in the text data. We propose four types of triad structures to represent the rules of causation transitivity among causal relations. Our proposed model, called CausalTriad, uses textual and structural knowledge to infer pseudo causal relations from the triad structures. Experimental results on two datasets demonstrate that (a) CausalTriad is effective for pseudo causal relation discovery within and across sentences; (b) CausalTriad is highly capable at recognizing implicit pseudo causal relations; (c) CausalTriad can infer missing/new pseudo causal relations from text data.

[1]  Sarvnaz Karimi,et al.  Concept Extraction to Identify Adverse Drug Reactions in Medical Forums: A Comparison of Algorithms , 2015, ArXiv.

[2]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[3]  Angela M. Coventry Hume's Theory of Causation , 2006 .

[4]  Nigam H. Shah,et al.  Building the graph of medicine from millions of clinical narratives , 2014, Scientific Data.

[5]  Ce Zhang,et al.  DeepDive: A Data Management System for Automatic Knowledge Base Construction , 2015 .

[6]  Nanyun Peng,et al.  Cross-Sentence N-ary Relation Extraction with Graph LSTMs , 2017, TACL.

[7]  Ned Hall,et al.  Causation: A User's Guide , 2014 .

[8]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[9]  David Arbour,et al.  Inferring Network Effects from Observational Data , 2016, KDD.

[10]  Jianfeng Gao,et al.  MSR SPLAT, a language analysis toolkit , 2012, HLT-NAACL.

[11]  P. Suppes A Probabilistic Theory Of Causality , 1970 .

[12]  Mehwish Riaz,et al.  Recognizing Causality in Verb-Noun Pairs via Noun and Verb Semantics , 2014, EACL 2014.

[13]  P. Wolff,et al.  Models of causation and the semantics of causal verbs , 2003, Cognitive Psychology.

[14]  Cécile Paris,et al.  Text and Data Mining Techniques in Adverse Drug Reaction Detection , 2015, ACM Comput. Surv..

[15]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[16]  ChengXiang Zhai,et al.  Constructing and Embedding Abstract Event Causality Networks from Text Snippets , 2017, WSDM.

[17]  K. Bretonnel Cohen,et al.  Concept annotation in the CRAFT corpus , 2012, BMC Bioinformatics.

[18]  Kira Radinsky,et al.  Learning causality for news events prediction , 2012, WWW.

[19]  Ned Hall,et al.  Causation: A User's Guide , 2013 .

[20]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[21]  Ralph Grishman,et al.  Relation Extraction: Perspective from Convolutional Neural Networks , 2015, VS@HLT-NAACL.

[22]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[23]  Hong Yu,et al.  Extracting synonymous gene and protein terms from biological literature , 2003, ISMB.

[24]  Jong-Hoon Oh,et al.  Generating Event Causality Hypotheses through Semantic Relations , 2015, AAAI.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Dan Roth,et al.  Minimally Supervised Event Causality Identification , 2011, EMNLP.

[27]  Wanxiang Che,et al.  LTP: A Chinese Language Technology Platform , 2010, COLING.

[28]  A. Stuart,et al.  Non-Parametric Statistics for the Behavioral Sciences. , 1957 .

[29]  Kathy McKeown,et al.  Identifying Causal Relations Using Parallel Wikipedia Articles , 2016, ACL.

[30]  Zhao Fang,et al.  TCMID: traditional Chinese medicine integrative database for herb molecular mechanism analysis , 2012, Nucleic Acids Res..

[31]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[32]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[33]  Susanne M. Humphrey,et al.  A recent advance in the automatic indexing of the biomedical literature , 2009, J. Biomed. Informatics.