Supervised Machine Learning Techniques to Detect TimeML Events in French and English

Identifying events from texts is an information extraction task necessary for many NLP applications. Through the TimeML specifications and TempEval challenges, it has received some attention in recent years. However, no reference result is available for French. In this paper, we try to fill this gap by proposing several event extraction systems, combining for instance Conditional Random Fields, language modeling and k-nearest-neighbors. These systems are evaluated on French corpora and compared with state-of-the-art methods on English. The very good results obtained on both languages validate our approach.

[1]  Alan F. Smeaton,et al.  Using Twitter to Detect and Tag Important Events in Live Sports , 2011 .

[2]  Alan F. Smeaton,et al.  Using Twitter to Detect and Tag Important Events in Sports Media , 2011, ICWSM.

[3]  Romaric Besançon,et al.  Text Segmentation and Graph-based Method for Template Filling in Information Extraction , 2011, IJCNLP.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  James F. Allen,et al.  Deep Semantic Analysis of Text , 2008, STEP.

[6]  James Pustejovsky,et al.  ISO-TimeML: An International Standard for Semantic Annotation , 2010, LREC.

[7]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[8]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[9]  A. Bittar Building a TimeBank for French : a reference Corpus Annotated According to the ISO-TimeML Standard , 2010 .

[10]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[11]  Anne Vilnat,et al.  Event Nominals: Annotation Guidelines and a Manually Annotated Corpus in French , 2012, LREC.

[12]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[13]  Sivaji Bandyopadhyay,et al.  JU_CSE_TEMP: A First Step towards Evaluating Events, Time Expressions and Temporal Relations , 2010, *SEMEVAL.

[14]  Ludovic Tanguy,et al.  Webaffix : un outil d’acquisition morphologique dérivationnelle à partir du Web , 2002, JEPTALNRECITAL.

[15]  Anne Vilnat,et al.  Automatically Generated Noun Lexicons for Event Extraction , 2012, CICLing.

[16]  James Pustejovsky,et al.  Evita: A Robust Event Recognizer For QA Systems , 2005, HLT.

[17]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[18]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[19]  Estela Saquete Boró,et al.  TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2 , 2010, *SEMEVAL.

[20]  James H. Martin,et al.  Identification of Event Mentions and their Semantic Class , 2006, EMNLP.

[21]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[22]  Philippe Muller,et al.  Annotation d’expressions temporelles et d’événements en français , 2008, JEPTALNRECITAL.

[23]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[24]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[25]  Romaric Besançon,et al.  Construire et évaluer une application de veille pour l'information sur les événements sismiques , 2011, CORIA.

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  Beatrice Alex,et al.  Edinburgh-LTG: TempEval-2 System Description , 2010, *SEMEVAL.

[28]  James Pustejovsky,et al.  SemEval-2007 Task 15: TempEval Temporal Relation Identification , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[29]  Tommaso Caselli,et al.  SemEval-2010 Task 13: TempEval-2 , 2010, *SEMEVAL.

[30]  James F. Allen,et al.  TRIPS and TRIOS System for TempEval-2: Extracting Temporal Information from Text , 2010, *SEMEVAL.

[31]  Béatrice Arnulphy,et al.  Désignations nominales des événements : étude et extraction automatique dans les textes. (Nominal designation of events : study and automatic extraction in texts) , 2012 .