Forecasting Significant Societal Events Using The Embers Streaming Predictive Analytics System

Developed under the Intelligence Advanced Research Project Activity Open Source Indicators program, Early Model Based Event Recognition using Surrogates (EMBERS) is a large-scale big data analytics system for forecasting significant societal events, such as civil unrest events on the basis of continuous, automated analysis of large volumes of publicly available data. It has been operational since November 2012 and delivers approximately 50 predictions each day for countries of Latin America. EMBERS is built on a streaming, scalable, loosely coupled, shared-nothing architecture using ZeroMQ as its messaging backbone and JSON as its wire data format. It is deployed on Amazon Web Services using an entirely automated deployment process. We describe the architecture of the system, some of the design tradeoffs encountered during development, and specifics of the machine learning models underlying EMBERS. We also present a detailed prospective evaluation of EMBERS in forecasting significant societal events in the past 2 years.

[1]  Peter Sommerlad,et al.  Pattern-Oriented Software Architecture: A System of Patterns: John Wiley & Sons , 1987 .

[2]  M. Bradley,et al.  Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings , 1999 .

[3]  Eric S. Raymond,et al.  The Art of Unix Programming , 2003 .

[4]  Jeffrey P. Walker,et al.  THE GLOBAL LAND DATA ASSIMILATION SYSTEM , 2004 .

[5]  Jaime Redondo,et al.  The Spanish adaptation of ANEW (Affective Norms for English Words) , 2007, Behavior research methods.

[6]  Kenneth D. Mandl,et al.  HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports , 2008, Journal of the American Medical Informatics Association.

[7]  Ben Y. Reis,et al.  Surveillance Sans Frontières: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project , 2008, PLoS medicine.

[8]  B. J. Ferro Castro,et al.  Pattern-Oriented Software Architecture: A System of Patterns , 2009 .

[9]  Lise Getoor,et al.  Probabilistic Similarity Logic , 2010, UAI.

[10]  Leon Derczynski,et al.  TIMEN: An Open Temporal Expression Normalisation Resource , 2012, LREC.

[11]  Badrish Chandramouli,et al.  DiAl: Distributed Streaming Analytics Anywhere, Anytime , 2013, Proc. VLDB Endow..

[12]  Martin Grund,et al.  Big data analytics on high Velocity streams: A case study , 2013, 2013 IEEE International Conference on Big Data.

[13]  Yin Yang,et al.  Resa: realtime elastic streaming analytics in the cloud , 2013, SIGMOD '13.

[14]  András A. Benczúr,et al.  Real-time streaming mobility analytics , 2013, 2013 IEEE International Conference on Big Data.

[15]  Shaowen Wang,et al.  Mapping the global Twitter heartbeat: The geography of Twitter , 2013, First Monday.

[16]  Aravind Srinivasan,et al.  'Beating the news' with EMBERS: forecasting civil unrest using open source indicators , 2014, KDD.