TimeBank-Driven TimeML Analysis

The design of TimeML as an expressive language for temporal information brings promises, and challenges; in particular, its representa- tional properties raise the bar for traditional information extraction meth- ods applied to the task of text-to-TimeML analysis. A reference corpus, such as TimeBank, is an invaluable asset in this situation; however, certain characteristics of TimeBank—size and consistency, primarily—present chal- lenges of their own. We discuss the design, implementation, and perfor- mance of an automatic TimeML-compliant annotator, trained on TimeBank, and deploying a hybrid analytical strategy of mixing aggressive finite- state processing over linguistic annotations with a state-of-the-art ma- chine learning technique capable of leveraging large amounts of unan- notated data. The results we report are encouraging in the light of a close analysis of TimeBank; at the same time they are indicative of the need for more infrastructure work, especially in the direction of creating a larger and more robust reference corpus. 1