Identifying Event Descriptions using Co-training with Online News Summaries

Systems that distill information about events from large corpora generally extract sentences that are relevant to a short event query. We present a novel co-training strategy for this task that employs a multidocument news summary corpus featuring 2.5 million unlabeled sentences, thus obviating the need for extensive manual annotation. Our experiments indicate that this technique significantly outperforms standard classification approaches with linear feature combination on this task. An analysis of our approach under various settings reveals how classifier and parameter choice can be used to control runtime overhead while contributing to an absolute increase of 22% in recall.

[1]  David Yarowsky,et al.  Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation , 2011, ACL.

[2]  Tobias Scheffer,et al.  Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics , 2004, Machine Learning.

[3]  James Allan,et al.  Simultaneous multilingual search for translingual information retrieval , 2008, CIKM '08.

[4]  Paul-Alexandru Chirita,et al.  Personalized query expansion for the web , 2007, SIGIR.

[5]  Jacob Andreas,et al.  Towards Semi-Automated Annotation for Prepositional Phrase Attachment , 2010, LREC.

[6]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[8]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[9]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[10]  Kathleen McKeown,et al.  Learning Methods to Combine Linguistic Indicators:Improving Aspectual Classification and Revealing Linguistic Insights , 2000, CL.

[11]  Vasileios Hatzivassiloglou,et al.  Event-Based Extractive Summarization , 2004 .

[12]  Vasileios Hatzivassiloglou,et al.  Domain -independent detection, extraction, and labeling of Atomic Events , 2003 .

[13]  Zhu Zhang,et al.  NewsInEssence: A System For Domain-Independent, Real-Time News Clustering and Multi-Document Summarization , 2001, HLT.

[14]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[15]  Cosmin Adrian Bejan Unsupervised Discovery of Event Scenarios from Texts , 2008, FLAIRS Conference.

[16]  Inderjeet Mani,et al.  Inferring Temporal Ordering of Events in News , 2003, NAACL.

[17]  Guodong Zhou,et al.  Semi-Supervised Learning for Imbalanced Sentiment Classification , 2011, IJCAI.

[18]  Alexander I. Rudnicky,et al.  Using the Amazon Mechanical Turk for transcription of spoken language , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Wessel Kraaij,et al.  Unsupervised Event Clustering in Multilingual News Streams , 2002 .

[20]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[21]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[22]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[23]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Schemas and their Participants , 2009, ACL.

[24]  Jun Du,et al.  When Does Cotraining Work in Real Data? , 2011, IEEE Transactions on Knowledge and Data Engineering.

[25]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[26]  James H. Martin,et al.  Identification of Event Mentions and their Semantic Class , 2006, EMNLP.

[27]  Jun Du,et al.  When does Co-training Work in Real Data? , 2009, PAKDD.

[28]  Reid Swanson,et al.  Learning a Probabilistic Model of Event Sequences from Internet Weblog Stories , 2008, FLAIRS Conference.

[29]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[30]  Avare Stewart,et al.  Unsupervised public health event detection for epidemic intelligence , 2010, CIKM.

[31]  Sandra Kübler,et al.  Filling the Gap: Semi-Supervised Learning for Opinion Detection Across Domains , 2011, CoNLL.