Intelligent Optimization of Parallel and Distributed Applications

This paper describes a new project that systematically addresses the enormous complexity of mapping applications to current and future parallel platforms. By integrating the system layers - domain-specific environment, application program, compiler, run-time environment, performance models and simulation, and workflow manager - and through a systematic strategy for application mapping, our approach exploit the vast machine resources available in such parallel platforms to dramatically increase the productivity of application programmers. This project brings together computer scientists in the areas represented by the system layers (i.e., language extensions, compilers, run-time systems, workflows) together with expertise in knowledge representation and machine learning. With expert domain scientists in molecular dynamics (MD) simulation, we are developing our approach in the context of a specific application class which already targets environments consisting of several hundreds of processors. In this way, we gain valuable insight into a generalizable strategy, while simultaneously producing performance benefits for existing and important applications.

[1]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[2]  Chun Chen,et al.  Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.

[3]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[4]  B. Singer,et al.  Stochastic Search for Signal Processing Algorithm Optimization , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[5]  Yolanda Gil,et al.  Wings for Pegasus: A Semantic Approach to Creating Very Large Scientific Workflows , 2006, OWLED.

[6]  Rizos Sakellariou,et al.  Scheduling Data-IntensiveWorkflows onto Storage-Constrained Distributed Resources , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[7]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[8]  Paul N. Hilfinger,et al.  Better Tiling and Array Contraction for Compiling Scientific Programs , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[9]  Josep Llosa,et al.  Optimizing program locality through CMEs and GAs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[10]  Douglas L. Jones,et al.  Fast searches for effective optimization phase sequences , 2004, PLDI '04.

[11]  Joel H. Saltz,et al.  Distributed processing of very large datasets with DataCutter , 2001, Parallel Comput..

[12]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[13]  Subhash Saini,et al.  Scalable atomistic simulation algorithms for materials research , 2002 .

[14]  V.S. Kumar,et al.  Large Image Correction and Warping in a Cluster Environment , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[15]  Laxmikant V. Kalé,et al.  NAMD: Biomolecular Simulation on Thousands of Processors , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[16]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[17]  Rajiv K. Kalia,et al.  Scalable and portable implementation of the fast multipole method on parallel computers , 2003 .

[18]  Yoon-Ju Lee,et al.  Empirical Optimization for a Sparse Linear Solver: A Case Study , 2005, International Journal of Parallel Programming.

[19]  Chun Chen,et al.  A Systematic Approach to Model-Guided Empirical Search for Memory Hierarchy Optimization , 2005, LCPC.

[20]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[21]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.