Temporal knowledge extraction from large-scale text corpus

Knowledge, in practice, is time-variant and many relations are only valid for a certain period of time. This phenomenon highlights the importance of harvesting temporal-aware knowledge, i.e., the relational facts coupled with their valid temporal interval. Inspired by pattern-based information extraction systems, we resort to temporal patterns to extract time-aware knowledge from free text. However, pattern design is extremely laborious and time consuming even for a single relation, and free text is usually ambiguous which makes temporal instance extraction extremely difficult. Therefore, in this work, we study the problem of temporal knowledge extraction with two steps: (1) temporal pattern extraction by automatically analysing a large-scale text corpus with a small number of seed temporal facts, (2) temporal instance extraction by applying the identified temporal patterns. For pattern extraction, we introduce various techniques, including corpus annotation, pattern generation, scoring and clustering, to improve both accuracy and coverage of the extracted patterns. For instance extraction, we propose a double-check strategy to improve the accuracy and a set of node-extension rules to improve the coverage. We conduct extensive experiments on real world datasets and compared with state-of-the-art systems. Experimental results verify the effectiveness of our proposed methods for temporal knowledge harvesting.

[1]  Gerhard Weikum,et al.  Extraction of temporal facts and events from Wikipedia , 2012, TempWeb '12.

[2]  Wen Hua,et al.  Context-Aware Temporal Knowledge Graph Embedding , 2019, WISE.

[3]  Dantong Ouyang,et al.  A Fine-grained and Noise-aware Method for Neural Relation Extraction , 2019, CIKM.

[4]  Vanessa López,et al.  Core techniques of question answering systems over knowledge bases: a survey , 2017, Knowledge and Information Systems.

[5]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[6]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[7]  Gerhard Weikum,et al.  Harvesting facts from textual web sources by constrained label propagation , 2011, CIKM '11.

[8]  Luis Gravano,et al.  Extracting Relations from Large Plain-Text Collections , 1999 .

[9]  Tom M. Mitchell,et al.  Coupled temporal scoping of relational facts , 2012, WSDM '12.

[10]  Gerhard Weikum,et al.  Scalable knowledge harvesting with high precision and high recall , 2011, WSDM '11.

[11]  Christopher D. Manning,et al.  Deep Reinforcement Learning for Mention-Ranking Coreference Models , 2016, EMNLP.

[12]  Gerhard Weikum,et al.  Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia , 2010, EDBT '10.

[13]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[14]  Markus Krötzsch,et al.  Wikidata , 2014 .

[15]  Xiaofang Zhou,et al.  Discovering Correlations between Sparse Features in Distant Supervision for Relation Extraction , 2019, WSDM.

[16]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[17]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[18]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[19]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[20]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[21]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[22]  Jonathan Berant,et al.  Neural Semantic Parsing over Multiple Knowledge-bases , 2017, ACL.

[23]  Yiyu Shi,et al.  A Novel Unsupervised Approach for Precise Temporal Slot Filling from Incomplete and Noisy Temporal Contexts , 2019, WWW.

[24]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[25]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[26]  Pascale Fung,et al.  Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems , 2018, ACL.

[27]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[28]  Gerhard Weikum,et al.  Knowledge harvesting in the big-data era , 2013, SIGMOD '13.

[29]  Anselmo Peñas,et al.  UNED Slot Filling and Temporal Slot Filling systems at TAC KBP 2013: System description , 2013, TAC.

[30]  Ximing Li,et al.  Distant supervision for neural relation extraction integrated with word attention and property features , 2018, Neural Networks.

[31]  Tom M. Mitchell,et al.  Leveraging Knowledge Bases in LSTMs for Improving Machine Reading , 2017, ACL.

[32]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[33]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[34]  Ralph Grishman,et al.  Distant Supervision for Relation Extraction with an Incomplete Knowledge Base , 2013, NAACL.

[35]  John M. Foley The Role of Temporal Transients in Forward and Backward Masking , 2010 .

[36]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[37]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[38]  Gerhard Weikum,et al.  Coupling Label Propagation and Constraints for Temporal Fact Extraction , 2012, ACL.

[39]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[40]  Frederick Reiss,et al.  Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems! , 2013, EMNLP.

[41]  Yu Liu,et al.  Extracting Temporal Patterns from Large-Scale Text Corpus , 2019, ADC.

[42]  Ricardo Campos,et al.  Survey of Temporal Information Retrieval and Related Applications , 2014, ACM Comput. Surv..

[43]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.