Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters

Modeling dynamical systems represents an important application class covering a wide range of disciplines including but not limited to biology, chemistry, finance, national security, and health care. Such applications typically involve large-scale, irregular graph processing, which makes them difficult to scale due to the evolutionary nature of their workload, irregular communication and load imbalance. EpiSimdemics is such an application simulating epidemic diffusion in extremely large and realistic social contact networks. It implements a graph-based system that captures dynamics among co-evolving entities. This paper presents an implementation of EpiSimdemics in Charm++ that enables future research by social, biological and computational scientists at unprecedented data and system scales. We present new methods for application-specific processing of graph data and demonstrate the effectiveness of these methods on a Cray XE6, specifically NCSA's Blue Waters system.

[1]  Nancy M. Amato,et al.  Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[2]  Sudhir Gupta,et al.  Case Studies , 2013, Journal of Clinical Immunology.

[3]  Xueqi Cheng,et al.  BC-GA: A Graph Partitioning Algorithm for Parallel Simulation of Internet Applications , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[4]  Laxmikant V. Kalé,et al.  Automated Load Balancing Invocation Based on Application Characteristics , 2012, 2012 IEEE International Conference on Cluster Computing.

[5]  Madhav V. Marathe,et al.  Modeling interaction between individuals, social networks and public policy to support public health epidemiology , 2009, Proceedings of the 2009 Winter Simulation Conference (WSC).

[6]  Madhav V. Marathe,et al.  EpiSimdemics: An efficient algorithm for simulating the spread of infectious disease over large realistic social networks , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  A. Nizam,et al.  Containing Pandemic Influenza at the Source , 2005, Science.

[8]  Jean Roman,et al.  SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs , 1996, HPCN Europe.

[9]  George Karypis,et al.  Multilevel algorithms for partitioning power-law graphs , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[10]  Seungyeop Han,et al.  Analysis of topological characteristics of huge online social networking services , 2007, WWW '07.

[11]  Aravind Srinivasan,et al.  Modelling disease outbreaks in realistic urban social networks , 2004, Nature.

[12]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[13]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[14]  Vipin Kumar,et al.  Multilevel Algorithms for Multi-Constraint Graph Partitioning , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[15]  N. Ferguson,et al.  Planning for smallpox outbreaks , 2003, Nature.

[16]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: applications in VLSI domain , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[17]  Jon Parker,et al.  A Distributed Platform for Global-Scale Agent-Based Models of Disease Transmission , 2011, TOMC.

[18]  Madhav V. Marathe,et al.  Generation and analysis of large synthetic social contact networks , 2009, Proceedings of the 2009 Winter Simulation Conference (WSC).

[19]  Sudip K. Seal,et al.  Discrete event modeling and massively parallel execution of epidemic outbreak phenomena , 2012, Simul..

[20]  Laxmikant V. Kalé,et al.  Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).