论文信息 - Subword and Spatiotemporal Models for Identifying Actionable Information in Haitian Kreyol

Subword and Spatiotemporal Models for Identifying Actionable Information in Haitian Kreyol

Crisis-affected populations are often able to maintain digital communications but in a sudden-onset crisis any aid organizations will have the least free resources to process such communications. Information that aid agencies can actually act on, 'actionable' information, will be sparse so there is great potential to (semi)automatically identify actionable communications. However, there are hurdles as the languages spoken will often be under-resourced, have orthographic variation, and the precise definition of 'actionable' will be response-specific and evolving. We present a novel system that addresses this, drawing on 40,000 emergency text messages sent in Haiti following the January 12, 2010 earthquake, predominantly in Haitian Kreyol. We show that keyword/ngram-based models using streaming MaxEnt achieve up to F=0.21 accuracy. Further, we find current state-of-the-art subword models increase this substantially to F=0.33 accuracy, while modeling the spatial, temporal, topic and source contexts of the messages can increase this to a very accurate F=0.86 over direct text messages and F=0.90-0.97 over social media, making it a viable strategy for message prioritization.

Robert Munro | Robert Munro

[1] Mervyn A. Jack,et al. A usability comparison of three alternative message formats for an SMS banking service , 2008, Int. J. Hum. Comput. Stud..

[2] Jennifer Widom,et al. Models and issues in data stream systems , 2002, PODS.

[3] WagnerWiebke. Steven Bird, Ewan Klein and Edward Loper , 2010, LREC 2010.

[4] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[5] Richard Sproat,et al. Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[6] José María Gómez Hidalgo,et al. Content based SMS spam filtering , 2006, DocEng '06.

[7] Geoff Hulten,et al. Mining time-changing data streams , 2001, KDD '01.

[8] Sarah Jane Delany,et al. An Assessment of Case Base Reasoning for Short Text Message Classification , 2004 .

[9] Jason Whalley,et al. The impact of mobile telephony on developing country micro-enterprise: A nigerian case study , 2008 .

[10] Patrick Paroubek,et al. Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[11] Cédrick Fairon,et al. A Hybrid Rule/Model-Based Finite-State Framework for Normalizing SMS Messages , 2010, ACL.