A Probabilistic Model with Commonsense Constraints for Pattern-based Temporal Fact Extraction

Textual patterns (e.g., Country’s president Person) are specified and/or generated for extracting factual information from unstructured data. Pattern-based information extraction methods have been recognized for their efficiency and transferability. However, not every pattern is reliable: A major challenge is to derive the most complete and accurate facts from diverse and sometimes conflicting extractions. In this work, we propose a probabilistic graphical model which formulates fact extraction in a generative process. It automatically infers true facts and pattern reliability without any supervision. It has two novel designs specially for temporal facts: (1) it models pattern reliability on two types of time signals, including temporal tag in text and text generation time; (2) it models commonsense constraints as observable variables. Experimental results demonstrate that our model significantly outperforms existing methods on extracting true temporal facts from news data.

[1]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[2]  Fei Wang,et al.  Believe It Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data , 2015, SDM.

[3]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[4]  Manfred K. Warmuth,et al.  Learning Binary Relations Using Weighted Majority Voting , 1995, Machine Learning.

[5]  Dan Roth,et al.  Content-driven trust propagation framework , 2011, KDD.

[6]  Jiawei Han,et al.  MetaPAD: Meta Pattern Discovery from Massive Text Corpora , 2017, KDD.

[7]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[8]  Laure Berti-Équille,et al.  Truth Discovery Algorithms: An Experimental Evaluation , 2014, ArXiv.

[9]  Yiyu Shi,et al.  A Novel Unsupervised Approach for Precise Temporal Slot Filling from Incomplete and Noisy Temporal Contexts , 2019, WWW.

[10]  Rahul Gupta,et al.  Biperpedia: An Ontology for Search Applications , 2014, Proc. VLDB Endow..

[11]  Avirup Sil,et al.  Towards Temporal Scoping of Relational Facts based on Wikipedia Data , 2014, CoNLL.

[12]  Juliana Freire,et al.  A Unified Index for Spatio-Temporal Keyword Queries , 2016, CIKM.

[13]  Fenglong Ma,et al.  TextTruth: An Unsupervised Approach to Discover Trustworthy Information from Multi-Sourced Text Data , 2018, KDD.

[14]  Iryna Gurevych,et al.  Temporal Anchoring of Events for the TimeBank Corpus , 2016, ACL.

[15]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[16]  Bo Zhao,et al.  On the Discovery of Evolving Truth , 2015, KDD.

[17]  Laure Berti-Équille,et al.  Data veracity estimation with ensembling truth discovery methods , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[18]  Wenhao Yu,et al.  Faceted Hierarchy: A New Graph Type to Organize Scientific Concepts and a Construction Method , 2019, EMNLP.

[19]  Jiawei Han,et al.  TruePIE: Discovering Reliable Patterns in Pattern-Based Information Extraction , 2018, KDD.

[20]  Fenglong Ma,et al.  Towards Confidence in the Truth: A Bootstrapping based Truth Discovery Approach , 2016, KDD.

[21]  Jiawei Han,et al.  Automated Phrase Mining from Massive Text Corpora , 2017, IEEE Transactions on Knowledge and Data Engineering.

[22]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[23]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[24]  Nitesh V. Chawla,et al.  The Role of "Condition": A Novel Scientific Knowledge Graph Representation and Construction Model , 2019, KDD.

[25]  Brian M. Sadler,et al.  TaxoGen: Constructing Topical Concept Taxonomy by Adaptive Term Embedding and Clustering , 2018, KDD 2018.

[26]  Jiawei Han,et al.  A Probabilistic Model for Estimating Real-valued Truth from Conflicting Sources , 2012 .

[27]  Melisachew Wudage Chekol Scaling Probabilistic Temporal Query Evaluation , 2017, CIKM.

[28]  Gerhard Weikum,et al.  FINET: Context-Aware Fine-Grained Named Entity Typing , 2015, EMNLP.