Extracting Temporal Information from Open Domain Text: A Comparative Exploration

The utility of data-driven techniques in the end-to-end problem of temporal information extraction is unclear. Recognition of temporal expressions yields readily to machine learning, but normalization seems to call for a rule-based approach. We explore two aspects of the (potential) utility of data-driven methods in the temporal information extraction task. First, we look at whether improving recognition beyond the rule base used by a normalizer has an eect on normalization performance, comparing normalizer performance when fed by several recognition systems. We also perform an error analysis of our normalizer’s performance to uncover aspects of the normalization task that might be amenable to data-driven techniques.

[1]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[2]  Inderjeet Mani,et al.  Robust Temporal Processing of News , 2000, ACL.

[3]  AbneySteven Partial parsing via finite-state cascades , 1996 .

[4]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[5]  Luca Cardelli,et al.  Greedy Regular Expression Matching , 2004, ICALP.

[6]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[7]  Giuseppe Castagna,et al.  CDuce: an XML-centric general-purpose language , 2003, ACM SIGPLAN Notices.

[8]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[9]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[10]  Acknowledgments , 2006, Molecular and Cellular Endocrinology.

[11]  Gregory Grefenstette,et al.  Regular expressions for language engineering , 1996, Natural Language Engineering.

[12]  Frank Schilder,et al.  From Temporal Expressions To Temporal Information: Semantic Tagging Of News Messages , 2001, The Language of Time - A Reader.

[13]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[14]  J. V. Rauff,et al.  Finite State Morphology , 2007 .

[15]  Robert J. Gaizauskas,et al.  Annotating Events and Temporal Information in Newswire Texts , 2000, LREC.

[16]  Douglas E. Appelt,et al.  Introduction to Information Extraction Technology , 1999, IJCAI 1999.

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  Inderjeet Mani,et al.  2003 Standard for the Annotation of Temporal Expressions , 2004 .