Performance Technolgies for Peta-Scale Systems: A White Paper Prepared by the Performance Evaluation Research Center

Future-looking high end computing initiatives will deploy powerful, large-scale computing platforms that leverage novel component technologies for superior node performance in advanced system architectures with tens or even hundreds of thousands of nodes. Recent advances in performance tools and modeling methodologies suggest that it is feasible to acquire such systems intelligently and achieve excellent performance, while also significantly reducing the user time required to attain high performance. These developments are relevant to several aspects of future HEC technology outlined in the recent HECRTF white paper request, in particular items 5.4, 5.5, 5.6, and 5.8. We envision the following specific capabilities: (1) Performance modeling tools, available to researchers and vendors, will extrapolate performance from prototype systems to full-scale systems, and even accurately predict performance behavior before systems are manufactured, thus enabling both improved designs and more intelligent selection of systems in procurements. (2) System simulation facilities, implemented on highly parallel platforms and available to researchers and vendors, will for instance realistically model the performance of a specific interprocessor network design running a specific scientific application code. As with item 1, these facilities can lead both to improved designs and procurement decisions that yield significantly greater sustained performance for targeted scientific applications.more » (3) A program monitoring and analysis infrastructure, scalable to 100,000 processors and beyond, will provide performance information at every level of system's memory hierarchy and network. This infrastructure will build upon knowledge discovery and data mining techniques to be significantly more scalable and easier to use than the current infrastructure, and a standard version will be incorporated in most high-end systems. (4) Self-tuning software facilities, now available only for a few specialized libraries and requiring separate test runs, will be integrated into a broad range of scientific application codes. Eventually these facilities will make use of the performance monitoring infrastructure mentioned above, and will extend to dynamic optimization at the subroutine level. Each of these capabilities appears to be feasible, based on successes in current research projects. However, while the prototypes of many of these facilities are already in hand, significant additional research and development will be required to realize the full potential described above. In the following, we sketch some of this required research.« less

[1]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[2]  Jeffrey S. Vetter,et al.  Scalable Analysis Techniques for Microprocessor Performance Counter Metrics , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[3]  Adolfy Hoisie,et al.  Exploring advanced architectures using performance prediction , 2002, International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems.

[4]  Phil Hontalas,et al.  Distributed Simulation and the Time Wrap Operating System. , 1987, SOSP 1987.

[5]  Jeffrey S. Vetter Performance analysis of distributed applications using automatic classification of communication inefficiencies , 2000, ICS '00.

[6]  B. Miller,et al.  The Paradyn Parallel Performance Measurement Tools , 1995 .

[7]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Mendel Rosenblum,et al.  Using visualization to understand the behavior of computer systems , 2001 .

[9]  Jesús Labarta,et al.  A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[10]  Guang R. Gao,et al.  An executable analytical performance evaluation approach for early performance prediction , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[11]  Mark M. Mathis,et al.  A performance model of non-deterministic particle transport on large-scale systems , 2003, Future Gener. Comput. Syst..

[12]  Alexander V. Veidenbaum,et al.  Innovative Architecture for Future Generation High-Performance Processors and Systems , 2003, Innovative Architecture for Future Generation High-Performance Processors and Systems, 2003.

[13]  R.M. Fujimoto,et al.  Parallel and distributed simulation systems , 2001, Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304).

[14]  Adolfy Hoisie,et al.  Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications , 2000, Int. J. High Perform. Comput. Appl..

[15]  Alois Ferscha,et al.  Parallel and Distributed Simulation , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[16]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[17]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[18]  Christos Faloutsos,et al.  Data mining meets performance evaluation: fast algorithms for modeling bursty traffic , 2002, Proceedings 18th International Conference on Data Engineering.

[19]  Markus Schordan,et al.  Parallel object‐oriented framework optimization , 2004, Concurr. Comput. Pract. Exp..

[20]  Jeffrey S. Vetter,et al.  Asserting Performance Expectations , 2002, ACM/IEEE SC 2002 Conference (SC'02).