Model-Portability Experiments for Textual Temporal Analysis

We explore a semi-supervised approach for improving the portability of time expression recognition to non-newswire domains: we generate additional training examples by substituting temporal expression words with potential synonyms. We explore using synonyms both from WordNet and from the Latent Words Language Model (LWLM), which predicts synonyms in context using an unsupervised approach. We evaluate a state-of-the-art time expression recognition system trained both with and without the additional training examples using data from TempEval 2010, Reuters and Wikipedia. We find that the LWLM provides substantial improvements on the Reuters corpus, and smaller improvements on the Wikipedia corpus. We find that WordNet alone never improves performance, though intersecting the examples from the LWLM and WordNet provides more stable results for Wikipedia.

[1]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[2]  M. de Rijke,et al.  A Cascaded Machine Learning Approach to Interpreting Temporal Expressions , 2007, NAACL.

[3]  Marie-Francine Moens,et al.  KUL: Recognition and Normalization of Temporal Expressions , 2010, SemEval@ACL.

[4]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[5]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[6]  Matteo Negri,et al.  Recognition and Normalization of TimeExpressions : ITC-irst at TERN 2004 , 2005 .

[7]  Mihai Surdeanu,et al.  A Comparison of Statistical and Rule-Induction Learners for Automatic Tagging of Time Expressions in English , 2007, 14th International Symposium on Temporal Representation and Reasoning (TIME'07).

[8]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[9]  Tommaso Caselli,et al.  SemEval-2010 Task 13: TempEval-2 , 2010, *SEMEVAL.

[10]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[11]  Estela Saquete Boró,et al.  TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2 , 2010, *SEMEVAL.

[12]  Marie-Francine Moens,et al.  Semi-supervised Semantic Role Labeling Using the Latent Words Language Model , 2009, EMNLP.

[13]  Ying Chen,et al.  Automatic Time Expression Labeling for English and Chinese Text , 2005, CICLing.

[14]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[15]  Mihai Surdeanu,et al.  An Analysis of Bootstrapping for the Recognition of Temporal Expressions , 2009, HLT-NAACL 2009.

[16]  Inderjeet Mani,et al.  Robust Temporal Processing of News , 2000, ACL.

[17]  Mihai Surdeanu,et al.  A Hybrid Approach for the Acquisition of Information Extraction Patterns , 2006 .