Dynamic software updates for parallel high‐performance applications

Despite using multiple concurrent processors, a typical high‐performance parallel application is long‐running, taking hours, even days to arrive at a solution. To modify a running high‐performance parallel application, the programmer has to stop the computation, change the code, redeploy, and enqueue the updated version to be scheduled to run, thus wasting not only the programmer's time, but also expensive computing resources. To address these inefficiencies, this article describes how dynamic software updates (DSU) can be used to modify a parallel application on the fly, thus saving the programmer's time and using expensive computing resources more productively. The net effect of updating parallel applications dynamically can reduce the total time that elapses between posing a problem and arriving at a solution, otherwise known as time‐to‐discovery. To explore the benefits of dynamic updates for high performance applications, this article takes a two‐pronged approach. First, we describe our experiences of building and evaluating a system for dynamically updating applications running on a parallel cluster. We then review a large body of literature describing the existing state of the art in DSU and point out how this research can be applied to high‐performance applications. Our experimental results indicate that DSU have the potential to become a powerful tool in reducing time‐to‐discovery for high‐performance parallel applications. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  By Enrique Vargas,et al.  Dynamic Reconfiguration , 2003, Series in Computer Science.

[2]  Gavin M. Bierman,et al.  UpgradeJ: Incremental Typechecking for Class Upgrades , 2008, ECOOP.

[3]  Dong Kwan Kim,et al.  Overcoming JVM HotSwap constraints via binary rewriting , 2008, HotSWUp '08.

[4]  Alan Burns,et al.  Concurrent programming , 1980, Operating Systems Engineering.

[5]  Patrick Th. Eugster,et al.  Uniform proxies for Java , 2006, OOPSLA '06.

[6]  Todd D. Millstein,et al.  Statically scoped object adaptation with expanders , 2006, OOPSLA '06.

[7]  Dong Kwan Kim,et al.  Dynamic Software Updates for Accelerating Scientific Discovery , 2009, ICCS.

[8]  Naren Ramakrishnan,et al.  Modular, Fine-Grained Adaptation of Parallel Programs , 2009, ICCS.

[9]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[10]  Priya Narasimhan,et al.  Exploiting the Internet Inter-ORB Protocol Interface to Provide CORBA with Fault Tolerance , 1997, COOTS.

[11]  Kathryn S. McKinley,et al.  Dynamic software updates: a VM-centric approach , 2009, PLDI '09.

[12]  Tobias Ritzau,et al.  Dynamic Deployment of Java Applications , 2000 .

[13]  Alessandro Orso,et al.  A technique for dynamic updating of Java software , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[14]  Kathryn S. McKinley,et al.  Dynamic Software Updates for Java : A VM-Centric Approach , 2008 .

[15]  I. Lee,et al.  DYMOS: a dynamic modification system , 1983, SIGSOFT '83.

[16]  B. Tidor Molecular dynamics simulations , 1997, Current Biology.

[17]  Michael Hicks,et al.  Safe and Timely Dynamic Updates for Multi-threaded Programs , 2009, PLDI 2009.

[18]  Mark Baker,et al.  MPJ Express: Towards Thread Safe Java HPC , 2006, 2006 IEEE International Conference on Cluster Computing.

[19]  Jeffrey C. Carver,et al.  Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[20]  Yueh-Feng Lee,et al.  Java-based component framework for dynamic reconfiguration , 2005, IEE Proc. Softw..

[21]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[22]  Robert Gray,et al.  Dynamic C++ Classes - A Lightweight Mechanism to Update Code in a Running Program , 1998, USENIX Annual Technical Conference.

[23]  Martin D. Westhead,et al.  A benchmark suite for high performance Java , 2000, Concurr. Pract. Exp..

[24]  Claes Wikström,et al.  Concurrent programming in ERLANG (2nd ed.) , 1996 .

[25]  Gustavo Alonso,et al.  Controlled, systematic, and efficient code replacement for running java programs , 2008, Eurosys '08.

[26]  Muga Nishizawa,et al.  An Easy-to-Use Toolkit for Efficient Java Bytecode Translators , 2003, GPCE.

[27]  Naren Ramakrishnan,et al.  Modular implementation of adaptive decisions in stochastic simulations , 2009, SAC '09.

[28]  William G. Griswold,et al.  An Overview of AspectJ , 2001, ECOOP.

[29]  Steffen Hauptmann,et al.  On-line maintenance with on-the-fly software replacement , 1996, Proceedings of International Conference on Configurable Distributed Systems.

[30]  Insup Lee,et al.  DYMOS: a dynamic modification system , 1983 .

[31]  Robert Pawel Bialek,et al.  Dynamic Updates of Existing Java Applications Ph.D. Dissertation , 2006 .

[32]  Stephen Gilmore,et al.  Dynamic ML without dynamic types , 1997 .

[33]  Scott Nettles,et al.  Dynamic software updating , 2001, PLDI '01.

[34]  Jeffrey C. Carver,et al.  Understanding the High-Performance-Computing Community: A Software Engineer's Perspective , 2008, IEEE Software.

[35]  Bowen Alpern,et al.  Efficient implementation of Java interfaces: Invokeinterface considered harmless , 2001, OOPSLA '01.

[36]  Berend Smit,et al.  Molecular Dynamics Simulations , 2002 .

[37]  Dong Kwan Kim,et al.  Flexible and Efficient In-Vivo Enhancement for Grid Applications , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[38]  Jeffrey C. Carver,et al.  Empirical study design in the area of high-performance computing (HPC) , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[39]  Deepak Gupta,et al.  A Formal Framework for On-line Software Version Change , 1996, IEEE Trans. Software Eng..

[40]  Jeff Magee,et al.  Dynamic Configuration for Distributed Systems , 1985, IEEE Transactions on Software Engineering.

[41]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[42]  William H. Sanders,et al.  On low-cost error containment and recovery methods for guarded software upgrading , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[43]  Haibo Chen,et al.  POLUS: A POwerful Live Updating System , 2007, 29th International Conference on Software Engineering (ICSE'07).

[44]  Ophir Frieder,et al.  On-the-fly program modification: systems for dynamic updating , 1993, IEEE Software.

[45]  Toby Bloom,et al.  Reconfiguration and module replacement in Argus: theory and practice , 1993, Softw. Eng. J..

[46]  Earl T. Barr,et al.  Runtime Support for Type-Safe Dynamic Java Classes , 2000, ECOOP.

[47]  Gustavo Alonso,et al.  Dynamic AOP with PROSE , 2005, CAiSE Workshops.

[48]  Joe Armstrong,et al.  Concurrent programming in ERLANG , 1993 .

[49]  John L. Klepeis,et al.  Anton, a special-purpose machine for molecular dynamics simulation , 2007, ISCA '07.

[50]  Walter Binder,et al.  Advanced Java bytecode instrumentation , 2007, PPPJ.

[51]  Xuejun Chen Extending RMI to support dynamic reconfiguration of distributed systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[52]  Iulian Neamtiu,et al.  Practical Dynamic Software Updating , 2008 .