Exploring the Usefulness of Cross-lingual Information Fusion for Refining Real-time News Event Extraction: A Preliminary Study

Nowadays, many influential facts are reported multiple times by different sources and in different languages. This paper presents the results of an experiment on deploying cross-lingual information fusion techniques for refining the results of a large-scale multilingual news event extraction system. An evaluation on a test corpus consisting of 618 event descriptions which refer to 523 real-world events revealed that the description of circa 10% of the events extracted by the mono-lingual systems could be refined. In particular, an overall gain of 6,4% and 4,8% in recall and precision against the best monolingual system could be obtained respectively.

[1]  Gideon S. Mann Multi-Document Relationship Fusion via Constraints on Probabilistic Databases , 2007, NAACL.

[2]  Ralph Grishman,et al.  Cross-lingual Information Extraction System Evaluation , 2004, COLING.

[3]  Heng Ji,et al.  Challenges from Information Extraction to Information Fusion , 2010, COLING.

[4]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[5]  Sivaji Bandyopadhyay,et al.  Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization , 2008 .

[6]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[7]  Steinberger Ralf,et al.  Combining Information about Epidemic Threats from Multiple Sources , 2007 .

[8]  Siddharth Patwardhan,et al.  Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions , 2007, EMNLP.

[9]  Ralph Grishman,et al.  Complexity of Event Structure in IE Scenarios , 2002, COLING.

[10]  Mark Last,et al.  A New Approach to Improving Multilingual Summarization Using a Genetic Algorithm , 2010, ACL.

[11]  Jakub Piskorski,et al.  Online News Event Extraction for Global Crisis Surveillance , 2011, Trans. Comput. Collect. Intell..

[12]  Kenneth D. Forbus,et al.  Using Explicit Semantic Models to Track Situations across News Articles , 2006, AAAI 2006.

[13]  Ralf Steinberger,et al.  Exploiting Machine Learning Techniques to Build an Event Extraction System for Portuguese and Spanish , 2009, Linguamática.

[14]  Heng Ji,et al.  Refining Event Extraction through Cross-Document Inference , 2008, ACL.

[15]  Erik Van der Goot,et al.  Near real time information mining in multilingual news , 2009, WWW '09.

[16]  Doug Downey,et al.  A Probabilistic Model of Redundancy in Information Extraction , 2005, IJCAI.

[17]  Jakub Piskorski,et al.  Multilingual Real-time Event Extraction for Border Security Intelligence Gathering , 2011, Counterterrorism and Open Source Intelligence.

[18]  Jakub Piskorski,et al.  Event Extraction for Italian Using a Cascade of Finite-State Grammars , 2009, FSMNLP.

[19]  Ralph Grishman,et al.  Using Document Level Cross-Event Inference to Improve Event Extraction , 2010, ACL.

[20]  Heng Ji,et al.  Can One Language Bootstrap the Other: A Case Study on Event Extraction , 2009, HLT-NAACL 2009.

[21]  Antoine Doucet,et al.  Filtering news for epidemic surveillance: towards processing more languages with fewer resources , 2010 .

[22]  Nicholas Kushmerick,et al.  Event Extraction from Heterogeneous News Sources , 2006 .

[23]  Jakub Piskorski,et al.  Real-Time News Event Extraction for Global Crisis Monitoring , 2008, NLDB.

[24]  Heng Ji,et al.  Enhancing Multi-lingual Information Extraction via Cross-Media Inference and Fusion , 2010, COLING.

[25]  Mark Steedman Information and syntax in spoken language systems , 1989 .

[26]  Heng Ji,et al.  Cross-document Event Extraction and Tracking: Task, Evaluation, Techniques and Challenges , 2009, RANLP.

[27]  Roman Yangarber,et al.  Redundancy-based Correction of Automatically Extracted Facts , 2005, HLT.

[28]  Piskorski Jakub,et al.  ExPRESS - Extraction Pattern Recognition Engine and Specification Suite , 2007 .

[29]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.