Evaluating Temporal Information Understanding with Temporal Question Answering

The temporal annotation scheme Time ML was developed to support research in complex temporal question answering (QA). Given the complexity of temporal QA, most of the efforts have focused, so far, on extracting temporal information, which has been evaluated with corpus-based evaluation. However, the QA task represents a natural way to evaluate temporal information understanding, and creating question sets is less costly for humans than manually annotating temporal information, which is required to perform corpus-based evaluation. Additionally, QA performance better captures the understanding of important temporal information as compared to corpus-based evaluation where all information is equally important for scoring. This paper presents a temporal QA system that performs temporal reasoning. It can be used to answer temporal questions (factoid, list and yes/no), about any document annotated in Time ML. In the paper, we show how this system can be used to evaluate automated temporal information understanding. Our QA-based evaluation results suggest that (i) the available temporal annotations are not complete, and (ii) QA provides a less costly and more reliable way of evaluating temporal understanding systems. To favour replicability, we made the temporal QA system and the question set used in the evaluation available.