The objective of the LORELEI Situation Frame task is to aggregate information from multiple data streams – including social media – into a comprehensive, actionable understanding of the basic facts needed to mount a response to an emerging situation. Rather than evaluating these capabilities in English, LORELEI is particularly concerned with advancing human language technology performance for low resource languages. The combination of domain, genre and language requirements make creation of linguistic resources for LORELEI in general, and the Situation Frame task in particular, especially challenging. Data is by definition relatively scarce for these languages, and real operational data may be impossible to come by, necessitating the use of “proxy” data sources. The annotation task itself, while superficially straightforward, requires navigating many difficult decisions involving the use of inference and the presence of widespread ambiguity and under-specification in the source data. We introduce the Situation Frame annotation task in the context of the goals of the larger LORELEI program, explore some of the most prevalent annotation challenges, and discuss the impact of various data types on annotation consistency. The data described in this paper will be made available to the wider research community after its use in LORELEI program evaluations.
[1]
Joseph Olive,et al.
Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation
,
2011
.
[2]
P. Meier,et al.
Crowdsourcing for Crisis Mapping in Haiti
,
2010,
Innovations: Technology, Governance, Globalization.
[3]
J. Mathias,et al.
Program
,
1970,
Symposium on VLSI Technology.
[4]
Carlos Castillo,et al.
AIDR: artificial intelligence for disaster response
,
2014,
WWW.
[5]
Stephanie Strassel,et al.
LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages
,
2016,
LREC.