Exploiting Background Information Networks to Enhance Bilingual Event Extraction Through Topic Modeling

In this paper we describe a novel approach of biased propagation based topic modeling to exploit global background knowledge for enhancing both the quality and portability of event extraction. The distributions of event triggers and arguments in topically-related documents are much more focused than those in a heterogeneous corpus. Based on this intuition, we apply topic modeling to automatically select training documents for annotation, and demonstrate it can significantly reduce annotation cost in order to achieve comparable performance for two different languages and two different genres. In addition, we conduct cross-document inference within each topic cluster and show that our approach advances state-of-the-art.

[1]  Heng Ji,et al.  Predicting Unknown Time Arguments based on Cross-Event Propagation , 2009, ACL.

[2]  Eugene Agichtein,et al.  Predicting accuracy of extracting information from unstructured text collections , 2005, CIKM '05.

[3]  Heng Ji,et al.  Improving Name Tagging by Reference Resolution and Relation Detection , 2005, ACL.

[4]  Xiaojin Zhu,et al.  A Topic Model for Word Sense Disambiguation , 2007, EMNLP.

[5]  Siddharth Patwardhan,et al.  A Unified Model of Phrasal and Sentential Evidence for Information Extraction , 2009, EMNLP.

[6]  Dan Roth,et al.  Exploiting Background Knowledge for Relation Extraction , 2010, COLING.

[7]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[8]  ChengXiang Zhai,et al.  Cross-Lingual Latent Topic Extraction , 2010, ACL.

[9]  Heng Ji,et al.  Can One Language Bootstrap the Other: A Case Study on Event Extraction , 2009, HLT-NAACL 2009.

[10]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[11]  Kan Li,et al.  Text Categorization Based on Topic Model , 2008, RSKT.

[12]  Padmini Srinivasan,et al.  A relevance-based topic model for news event tracking , 2009, SIGIR.

[13]  Heng Ji,et al.  Refining Event Extraction through Cross-Document Inference , 2008, ACL.

[14]  Jeffrey M. Zacks,et al.  Event structure in perception and conception. , 2001, Psychological bulletin.

[15]  Li Li,et al.  Person Name Disambiguation based on Topic Model , 2010, CIPS-SIGHAN.

[16]  Bo Zhao,et al.  Probabilistic topic models with biased propagation on heterogeneous information networks , 2011, KDD.

[17]  Min Wan,et al.  Study on topic segmenting method in automatic abstracting system , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[18]  Jiawei Han,et al.  Modeling hidden topics on document manifold , 2008, CIKM '08.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[21]  Heng Ji,et al.  Collaborative Ranking: A Case Study on Entity Linking , 2011, EMNLP.

[22]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[23]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[24]  Ralph Grishman,et al.  Using Document Level Cross-Event Inference to Improve Event Extraction , 2010, ACL.

[25]  Siddharth Patwardhan,et al.  Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions , 2007, EMNLP.

[26]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Vincent Ng,et al.  Coreference Resolution with World Knowledge , 2011, ACL.

[29]  E. Paice,et al.  Collaborative learning , 2003, Medical education.

[30]  Anne S. Goodsell Collaborative Learning: A Sourcebook for Higher Education. , 1992 .

[31]  Doug Downey,et al.  A Probabilistic Model of Redundancy in Information Extraction , 2005, IJCAI.