In Situ Workflows at Exascale: System Software to the Rescue

Implementing an in situ workflow involves several challenges related to data placement, task scheduling, efficient communications, scalability, and reliability. Most of the current implementations provide reasonably performant solutions to these issues by focusing on high-performance communications and low-overhead execution models at the cost of reliability and flexibility. One of the key design choices in such infrastructures is between providing a single-program, integrated environment or a multiple-program, connected environment, both solutions having their own strengths and weaknesses. While these approaches might be appropriate for current production systems, the expected characteristics of exascale machines will shift current priorities. After a survey of the trade-offs and challenges of integrated and connected in situ workflow solutions available today, we discuss in this paper how exascale systems will impact those designs. In particular, we identify missing features of current system-level software required for the evolution of in situ workflows toward exascale and how system software innovations from the Argo Exascale Computing Project can help address those challenges.

[1]  Hank Childs,et al.  VisIt: Experiences with Sustainable Software , 2013, ArXiv.

[2]  Pete Beckman,et al.  Argo: An Exascale Operating System and Runtime , 2015 .

[3]  Karsten Schwan,et al.  Flexpath: Type-Based Publish/Subscribe System for Large-Scale Science Analytics , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[4]  Jeremy S. Meredith,et al.  Parallel in situ coupling of simulation with a fully featured visualization system , 2011, EGPGV '11.

[5]  Xiaocheng Zou,et al.  Transparent in Situ Data Transformations in ADIOS , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[6]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[7]  Franck Cappello,et al.  Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O , 2012, 2012 IEEE International Conference on Cluster Computing.

[8]  Scott Klasky,et al.  Enabling high-speed asynchronous data extraction and transfer using DART , 2010 .

[9]  Maya Gokhale,et al.  Argo NodeOS: Toward Unified Resource Management for Exascale , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[10]  Justin M. Wozniak,et al.  Lessons Learned from Building In Situ Coupling Frameworks , 2015, ISAV@SC.

[11]  Bruno Raffin,et al.  A Flexible Framework for Asynchronous in Situ and in Transit Analytics for Scientific Simulations , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[12]  Kenneth Moreland,et al.  Sandia National Laboratories , 2000 .

[13]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[14]  Patrick M. Widener,et al.  Efficient Data-Movement for Lightweight I/O , 2006, 2006 IEEE International Conference on Cluster Computing.

[15]  Arie Shoshani,et al.  Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks , 2014, Concurr. Comput. Pract. Exp..

[16]  Scott Klasky,et al.  DataSpaces: an interaction and coordination framework for coupled simulation workflows , 2012, HPDC '10.

[17]  Franck Cappello,et al.  Distributed Monitoring and Management of Exascale Systems in the Argo Project , 2015, DAIS.

[18]  George Bosilca,et al.  The Common Communication Interface (CCI) , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.

[19]  C. C. Law,et al.  ParaView: An End-User Tool for Large-Data Visualization , 2005, The Visualization Handbook.

[20]  Martin Schulz,et al.  Systemwide Power Management with Argo , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[21]  Matthieu Dreher,et al.  Decaf: Decoupled Dataflows for In Situ High-Performance Workflows , 2017 .

[22]  Karsten Schwan,et al.  Event-based systems: opportunities and challenges at exascale , 2009, DEBS '09.