Performance Engineering: Understanding and Improving thePerformance of Large-Scale Codes

Achieving good performance on high-end computing systems is growing ever more challenging due to enormous scale, increasing architectural complexity, and increasing application complexity. To address these challenges in DOE's SciDAC-2 program, the Performance Engineering Research Institute (PERI) has embarked on an ambitious research plan encompassing performance modeling and prediction, automatic performance optimization and performance engineering of high profile applications. The principal new component is a research activity in automatic tuning software, which is spurred by the strong user preference for automatic tools.

[1]  Arie Shoshani,et al.  Storage resource managers: essential components for the Grid , 2003 .

[2]  Chun Chen,et al.  Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.

[3]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[4]  Kwan-Liu Ma,et al.  A Tri-Space Visualization Interface for Analyzing Time-Varying Multivariate Volume Data , 2007, EuroVis.

[5]  David R. O'Hallaron,et al.  Remote runtime steering of integrated terascale simulation and visualization , 2006, SC.

[6]  Jitendra Malik,et al.  PointCloudXplore: Visual Analysis of 3D Gene Expression Data Using Physical Views and Parallel Coordinates , 2006, EuroVis.

[7]  Kees Verstoep,et al.  Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.

[8]  T. Hahm,et al.  Turbulent Transport Reduction by Zonal Flows: Massively Parallel Simulations , 1998 .

[9]  I-Hsin Chung,et al.  A Case Study Using Automatic Performance Tuning for Large-Scale Scientific Programs , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[10]  Jorge Luis Rodriguez,et al.  The Open Science Grid , 2005 .

[11]  David H. Bailey,et al.  Performance Modeling: Understanding the Past and Predicting the Future , 2005, Euro-Par.

[12]  Richard P. Martin,et al.  Assessing Fast Network Interfaces , 1996, IEEE Micro.

[13]  Kwan-Liu Ma,et al.  Visual interrogation of gyrokinetic particle simulations , 2007 .

[14]  Erich Strohmaier,et al.  Quantifying Locality In The Memory Access Patterns of HPC Applications , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[15]  Jarek Nieplocha,et al.  Multilevel Parallelism in Computational Chemistry using Common Component Architecture and Global Arrays , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[16]  Kavitha Ranganathan,et al.  Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids , 2003, Journal of Grid Computing.

[17]  Xiaofeng Gao,et al.  A Performance Prediction Framework for Scientific Applications , 2003, International Conference on Computational Science.

[18]  J. Stasko,et al.  Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[19]  Jack J. Dongarra,et al.  Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[20]  Kwan-Liu Ma,et al.  Machine Learning to Boost the Next Generation of Visualization Technology , 2007, IEEE Computer Graphics and Applications.

[21]  Jack J. Dongarra,et al.  Performance Analysis of MPI Collective Operations , 2005, IPDPS.

[22]  Adolfy Hoisie,et al.  Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications , 2000, Int. J. High Perform. Comput. Appl..

[23]  Arie Shoshani,et al.  Grid Collector: Using an event catalog to speed up user analysisin distributed environment , 2004 .

[24]  Jacqueline H. Chen,et al.  Direct numerical simulation of autoignition in non- homogeneous hydrogen-air mixtures , 2003 .

[25]  Sally A. McKee,et al.  An Approach to Performance Prediction for Parallel Applications , 2005, Euro-Par.

[26]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[27]  Ian T. Foster,et al.  Security for Grid services , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[28]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[29]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[30]  Sadaf R. Alam,et al.  A framework to develop symbolic performance models of parallel applications , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[31]  Arie Shoshani,et al.  The Earth System Grid: Supporting the Next Generation of Climate Modeling Research , 2005, Proceedings of the IEEE.

[32]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[33]  I. Foster,et al.  Service-Oriented Science , 2005, Science.

[34]  Arie Shoshani,et al.  On the performance of bitmap indices for high cardinality attributes , 2004, VLDB.

[35]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[36]  Ian T. Foster,et al.  Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, Journal of Computer Science and Technology.

[37]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[38]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[39]  Miron Livny,et al.  Data placement for scientific applications in distributed environments , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[40]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[41]  Scott Klasky,et al.  Gyrokinetic particle simulation of fusion plasmas: path to petascale computing , 2006 .

[42]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[43]  Andrea C. Arpaci-Dusseau,et al.  NeST: a Grid enabled storage appliance , 2004 .

[44]  David E. Bernholdt,et al.  Monitoring the Earth System Grid with MDS4 , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[45]  T. Tu,et al.  From Mesh Generation to Scientific Visualization: An End-to-End Approach to Parallel Supercomputing , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[46]  K. Stockinger,et al.  Detecting Distributed Scans Using High-Performance Query-Driven Visualization , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[47]  Kwan-Liu Ma Visualizing Visualizations: User Interfaces for Managing and Exploring Scientific Visualization Data , 2000, IEEE Computer Graphics and Applications.

[48]  Gunther H. Weber,et al.  Visualization Tools for Adaptive Mesh Refinement Data , 2007 .

[49]  Brian Tierney,et al.  NetLogger: A Toolkit for Distributed System Performance Tuning and Debugging , 2003, Integrated Network Management.

[50]  David E. Bernholdt,et al.  Research Initiatives for Plug-and-Play Scientific Computing , 2007 .

[51]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[52]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..