Performance Modeling and Analysis of a Massively Parallel Direct—Part 2

Modeling and analysis techniques are used to investigate the performance of a massively parallel version of DIRECT, a global search algorithm widely used in multidisciplinary design optimization applications. Several high-dimensional benchmark functions and real world problems are used to test the design effectiveness under various problem structures. In this second part of a two-part work, theoretical and experimental results are compared for two parallel clusters with different system scales and network connectivities. The first part studied performance sensitivity to important parameters for problem configurations and parallel schemes, using performance metrics such as memory usage, load balancing, and parallel efficiency. Here linear regression models are used to characterize two major overhead sources, interprocessor communication and processor idleness, and also applied to the isoefficiency functions in scalability analysis. For a variety of high-dimensional problems and large-scale systems, the massively parallel design has achieved reasonable performance. The results of the performance study provide guidance for efficient problem and scheme configuration. More importantly, the design considerations and analysis techniques generalize to the transformation of other global search algorithms into effective large-scale parallel optimization tools.

[1]  Dhabaleswar K. Panda,et al.  Microbenchmark performance comparison of high-speed cluster interconnects , 2004, IEEE Micro.

[2]  Clifford A. Shaffer,et al.  Globally optimal transmitter placement for indoor wireless communication systems , 2004, IEEE Transactions on Wireless Communications.

[3]  C. T. Kelley,et al.  A Locally-Biased form of the DIRECT Algorithm , 2001, J. Glob. Optim..

[4]  E. Vieth Fitting piecewise linear regression functions to biological responses. , 1989, Journal of applied physiology.

[5]  Sverker Holmgren,et al.  PERFORMANCE OF PDE SOLVERS ON A SELF-OPTIMIZING NUMA ARCHITECTURE , 2002 .

[6]  Dave Turner,et al.  Efficient Message-Passing within SMP Systems , 2003, PVM/MPI.

[7]  Zhao Zhang,et al.  Performance Modeling and Tuning Strategies of Mixed Mode Collective Communications , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[8]  Clifford A. Shaffer,et al.  Hierarchical parallel scheme for global parameter estimation in systems biology , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[9]  C. T. Kelley,et al.  Modifications of the direct algorithm , 2001 .

[10]  C. D. Perttunen,et al.  Lipschitzian optimization without the Lipschitz constant , 1993 .

[11]  Clifford A. Shaffer,et al.  Dynamic Data Structures for a Direct Search Algorithm , 2002, Comput. Optim. Appl..

[12]  Bernard Grossman,et al.  Parallel Global Aircraft Configuration Design Space Exploration , 1999 .

[13]  D. Finkel,et al.  Convergence analysis of the direct algorithm , 2004 .

[14]  Sverker Holmgren,et al.  Simultaneous search for multiple QTL using the global optimization algorithm DIRECT , 2004, Bioinform..

[15]  Owen J. Eslinger,et al.  Algorithms for Noisy Problems in Gas Transmission Pipeline Optimization , 2001 .

[16]  Sathish S. Vadhiyar,et al.  Towards an Accurate Model for Collective Communications , 2001, Int. J. High Perform. Comput. Appl..

[17]  P. Papalambros,et al.  A MODIFICATION TO JONES' GLOBAL OPTIMIZATION ALGORITHM FOR FAST LOCAL CONVERGENCE , 1998 .

[18]  Masha Sosonkina,et al.  Design and implementation of a massively parallel version of DIRECT , 2008, Comput. Optim. Appl..

[19]  Donald R. Jones,et al.  Direct Global Optimization Algorithm , 2009, Encyclopedia of Optimization.

[20]  Dhabaleswar K. Panda,et al.  High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.

[21]  Layne T. Watson,et al.  DIRECT Algorithm with Box Penetration for Improved Local Convergence , 2002 .

[22]  J. Dennis,et al.  Direct Search Methods on Parallel Machines , 1991 .

[23]  Jens Viggo Clausen Parallel Branch and Bound — Principles and Personal Experiences , 1997 .

[24]  Thomas D. Sandry,et al.  Introductory Statistics With R , 2003, Technometrics.

[25]  Yaroslav D. Sergeyev,et al.  Global Search Based on Efficient Diagonal Partitions and a Set of Lipschitz Constants , 2006, SIAM J. Optim..

[26]  Siegfried M. Rump,et al.  Symbolic Algebraic Methods and Verification Methods , 2001 .

[27]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[28]  Sathish S. Vadhiyar,et al.  Towards an Accurate Model for Collective Communications , 2004, Int. J. High Perform. Comput. Appl..

[29]  Clifford A. Shaffer,et al.  Deterministic parallel global parameter estimation for a model of the budding yeast cell cycle , 2008, J. Glob. Optim..

[30]  Masha Sosonkina,et al.  Performance Modeling and Analysis of a Massively Parallel Direct—Part 1 , 2009, Int. J. High Perform. Comput. Appl..

[31]  Layne T. Watson,et al.  A Fully Distribute Parallel Global Search Algorithm , 2001, PPSC.

[32]  Larry Carter,et al.  Scheduling strategies for master-slave tasking on heterogeneous processor platforms , 2004, IEEE Transactions on Parallel and Distributed Systems.

[33]  David B. Bogy,et al.  Direct algorithm and its application to slider air bearing surface optimization , 2002 .

[34]  Jean-Marc Geib,et al.  A parallel adaptive tabu search approach , 1998, Parallel Comput..

[35]  Masha Sosonkina,et al.  SCALABILITY ANALYSIS OF PARALLEL GMRES IMPLEMENTATIONS , 2002, Parallel Algorithms Appl..

[36]  T Watson Layne,et al.  A Distributed Genetic Algorithm with Migration for the Design of Composite Laminate Structures , 1998 .

[37]  Werner Krandick,et al.  On the Isoefficiency of the Parallel Descartes Method , 2001, Symbolic Algebraic Methods and Verification Methods.

[38]  Udi Manber,et al.  Introduction to algorithms - a creative approach , 1989 .

[39]  Ronald L. Graham,et al.  An Efficient Algorithm for Determining the Convex Hull of a Finite Planar Set , 1972, Inf. Process. Lett..

[40]  Vijay P. Kumar,et al.  Analyzing Scalability of Parallel Algorithms and Architectures , 1994, J. Parallel Distributed Comput..

[41]  Günter Haring,et al.  Performance Bounds for Distributed Systems with Workload Variabilities and Uncertainties , 1997, Parallel Comput..

[42]  L. Watson,et al.  Globally optimised parameters for a model of mitotic control in frog egg extracts. , 2005, Systems biology.

[43]  D. Finkel,et al.  An Adaptive Restart Implementation of DIRECT , 2004 .

[44]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[45]  Kento Aida,et al.  Distributed computing with hierarchical master-worker paradigm for parallel branch and bound algorithm , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[46]  Simon P. Wilson,et al.  Global optimization approaches to an aircraft routing problem , 2003, Eur. J. Oper. Res..

[47]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.