Open-Schema Event Profiling for Massive News Corpora

With the rapid growth of online information services, a sheer volume of news data becomes available. To help people quickly digest the explosive information, we define a new problem - schema-based news event profiling - profiling events reported in open-domain news corpora, with a set of slots and slot-value pairs for each event, where the set of slots forms the schema of an event type. Such profiling not only provides readers with concise views of events, but also facilitates various applications such as information retrieval, knowledge graph construction and question answering. It is however a quite challenging task. The first challenge is to find out events and event types because they are both initially unknown. The second difficulty is the lack of pre-defined event-type schemas. Lastly, even with the schemas extracted, to generate event profiles from them is still essential yet demanding. To address these challenges, we propose a fully automatic, unsupervised, three-step framework to obtain event profiles. First, we develop a Bayesian non-parametric model to detect events and event types by exploiting the slot expressions of the entities mentioned in news articles. Second, we propose an unsupervised embedding model for schema induction that encodes the insight: an entity may serve as the values of multiple slots in an event, but if it appears in more sentences along with the same set of more entities in the event, its slots in these sentences tend to be similar. Finally, we build event profiles by extracting slot values for each event based on the slots' expression patterns. To the best of our knowledge, this is the first work on schema-based profiling for news events. Experimental results on a large news corpus demonstrate the superior performance of our method against the state-of-the-art baselines on event detection, schema induction and event profiling.

[1]  Andrew McCallum,et al.  Fast and Robust Joint Models for Biomedical Event Extraction , 2011, EMNLP.

[2]  Oren Etzioni,et al.  Generating Coherent Event Schemas at Scale , 2013, EMNLP.

[3]  Ralph Grishman,et al.  New York University 2012 System for KBP Slot Filling , 2012, TAC.

[4]  Bin Wang,et al.  A probabilistic model for retrospective news event detection , 2005, SIGIR '05.

[5]  Ralph Grishman,et al.  An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition , 2003, ACL.

[6]  Jun'ichi Tsujii,et al.  A Rich Feature Vector for Protein-Protein Interaction Extraction from Multiple Corpora , 2009, EMNLP.

[7]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[8]  Ralph Grishman,et al.  Event Detection and Domain Adaptation with Convolutional Neural Networks , 2015, ACL.

[9]  Yancheng He,et al.  A Two-layer Text Clustering Approach for Retrospective News Event Detection , 2010, 2010 International Conference on Artificial Intelligence and Computational Intelligence.

[10]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[11]  Siddharth Patwardhan,et al.  Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions , 2007, EMNLP.

[12]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[13]  Heng Ji,et al.  Joint Event Extraction via Structured Prediction with Global Features , 2013, ACL.

[14]  Oren Etzioni,et al.  Open domain event extraction from twitter , 2012, KDD.

[15]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[16]  Chuan Wang,et al.  A Transition-based Algorithm for AMR Parsing , 2015, NAACL.

[17]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[18]  Ellen Riloff,et al.  Exploiting Subjectivity Classification to Improve Information Extraction , 2005, AAAI.

[19]  Nathanael Chambers,et al.  Template-Based Information Extraction without the Templates , 2011, ACL.

[20]  Zhifang Sui,et al.  Joint Learning Templates and Slots for Event Schema Induction , 2016, NAACL.

[21]  Mihai Surdeanu,et al.  Event Extraction as Dependency Parsing , 2011, ACL.

[22]  Jackie Chi Kit Cheung,et al.  Probabilistic Frame Induction , 2013, NAACL.

[23]  Vinay Setty,et al.  Modeling Event Importance for Ranking Daily News Events , 2017, WSDM.

[24]  Nathanael Chambers,et al.  Event Schema Induction with a Probabilistic Entity-Driven Model , 2013, EMNLP.

[25]  Dayne Freitag,et al.  Toward General-Purpose Learning for Information Extraction , 1998, ACL.

[26]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[27]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[28]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[29]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[30]  Ann Bies,et al.  Parallel Chinese-English Entities, Relations and Events Corpora , 2016, LREC.

[31]  Siddharth Patwardhan,et al.  A Unified Model of Phrasal and Sentential Evidence for Information Extraction , 2009, EMNLP.

[32]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[33]  Christopher D. Manning,et al.  Stanford's 2014 Slot Filling Systems , 2014 .

[34]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[35]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[36]  Romaric Besançon,et al.  Text Segmentation and Graph-based Method for Template Filling in Information Extraction , 2011, IJCNLP.

[37]  Mihai Surdeanu,et al.  A Hybrid Approach for the Acquisition of Information Extraction Patterns , 2006 .

[38]  Tom M. Mitchell,et al.  Joint Extraction of Events and Entities within a Document Context , 2016, NAACL.

[39]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[40]  Regina Barzilay,et al.  In-domain Relation Discovery with Meta-constraints via Posterior Regularization , 2011, ACL.

[41]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[42]  Romaric Besançon,et al.  Generative Event Schema Induction with Entity Disambiguation , 2015, ACL.

[43]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[44]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[45]  Jiawei Han,et al.  Mining Quality Phrases from Massive Text Corpora , 2015, SIGMOD Conference.

[46]  I. Ntroduction The ACE 2005 ( ACE 05 ) Evaluation Plan Evaluation of the Detection and Recognition of ACE Entities , Values , Temporal Expressions , Relations , and Events 1 , 2022 .

[47]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[48]  Ralph Grishman,et al.  Automatic Acquisition of Domain Knowledge for Information Extraction , 2000, COLING.

[49]  Bin Ma,et al.  Using Cross-Entity Inference to Improve Event Extraction , 2011, ACL.

[50]  Wei Zhang,et al.  Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction , 2013, IJCAI.

[51]  Vasileios Hatzivassiloglou,et al.  Automatic Creation of Domain Templates , 2006, ACL.

[52]  Shaowen Wang,et al.  GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams , 2016, SIGIR.

[53]  Ming Zhou,et al.  Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization , 2015, AAAI.

[54]  Jing Liu,et al.  RBPB: Regularization-Based Pattern Balancing Method for Event Extraction , 2016, ACL.

[55]  Tim Oates,et al.  Mining Script-Like Structures from the Web , 2010, HLT-NAACL 2010.

[56]  Heng Ji,et al.  Refining Event Extraction through Cross-Document Inference , 2008, ACL.

[57]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[58]  Ana-Maria Popescu,et al.  Extracting events and event descriptions from Twitter , 2011, WWW.

[59]  Razvan C. Bunescu,et al.  Collective Information Extraction with Relational Markov Networks , 2004, ACL.

[60]  Heng Ji,et al.  Liberal Event Extraction and Event Schema Induction , 2016, ACL.

[61]  Hila Becker,et al.  Identifying content for planned events across social media sites , 2012, WSDM '12.

[62]  Ralph Grishman,et al.  Using Document Level Cross-Event Inference to Improve Event Extraction , 2010, ACL.

[63]  Lynette Hirschman,et al.  Evaluating Message Understanding Systems: An Analysis of the Third Message Understanding Conference (MUC-3) , 1993, CL.

[64]  Mark Stevenson,et al.  A Semantic Approach to IE Pattern Induction , 2005, ACL.

[65]  Hwee Tou Ng,et al.  Closing the Gap: Learning-Based Information Extraction Rivaling Knowledge-Engineering Methods , 2003, ACL.

[66]  Ralph Grishman,et al.  Ensemble Semantics for Large-scale Unsupervised Relation Extraction , 2012, EMNLP-CoNLL.

[67]  Ralph Grishman,et al.  Filtered Ranking for Bootstrapping in Event Extraction , 2010, COLING.

[68]  Chin-Yew Lin Improving summarization performance by sentence compression: a pilot study , 2003, IRAL.