Detecting Load Imbalance in Massively Parallel Applications Internship Report

[1]  Robert J. Fowler,et al.  HPCVIEW: A Tool for Top-down Analysis of Node Performance , 2002, The Journal of Supercomputing.

[2]  Ken Kennedy,et al.  Automatic tuning of whole applications using direct search and a performance-based transformation system , 2006, The Journal of Supercomputing.

[3]  William Cyrus Navidi,et al.  Statistics for Engineers and Scientists , 2004 .

[4]  Wagner Meira,et al.  Waiting time analysis and performance visualization in Carnival , 1996, SPDT '96.

[5]  Jack Dongarra,et al.  Automating the Large-Scale Collection and Analysis of Performance , 2004 .

[6]  Ying Zhang,et al.  SvPablo: A Multi-language Performance Analysis System , 1998, Computer Performance Evaluation.

[7]  Anthony P. Reeves,et al.  Strategies for Dynamic Load Balancing on Highly Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..

[8]  Rick Kufrin,et al.  PerfSuite: An Accessible, Open Source Performance Analysis Environment for Linux , 2005 .

[9]  Martin Schulz,et al.  Scalable load-balance measurement for SPMD codes , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[10]  Thomas J. LeBlanc,et al.  Parallel performance prediction using lost cycles analysis , 1994, Proceedings of Supercomputing '94.

[11]  J. Mark Bull,et al.  A hierarchical classification of overheads in parallel programs , 1996, Software Engineering for Parallel and Distributed Systems.

[12]  Luiz De Rose,et al.  Detecting Application Load Imbalance on High End Massively Parallel Systems , 2007, Euro-Par.

[13]  Marc-André Hermanns,et al.  Verifying Causality between Distant Performance Phenomena in Large-Scale MPI Applications , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[14]  C. Glymour,et al.  STATISTICS AND CAUSAL INFERENCE , 1985 .

[15]  Dror G. Feitelson,et al.  Flexible coscheduling: mitigating load imbalance and improving utilization of heterogeneous resources , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[16]  Craig B. Zilles,et al.  A criticality analysis of clustering in superscalar processors , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[17]  Martin Schulz,et al.  An Open Infrastructure for Scalable, Reconfigurable Analysis , 2008 .

[18]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, ISCA.

[19]  Markus Geimer,et al.  Scalable Performance Analysis Methods for the Next Generation of Supercomputers , 2008 .

[20]  Rizos Sakellariou,et al.  Compile-time minimisation of load imbalance in loop nests , 1997, ICS '97.

[21]  Scott Pakin,et al.  Identifying and Eliminating the Performance Variability on the ASCI Q Machine , 2003 .

[22]  Bernd Mohr,et al.  Scalable Parallel Trace-Based Performance Analysis , 2006, PVM/MPI.

[23]  Marcelo H. Cintra,et al.  A compiler cost model for speculative parallelization , 2007, TACO.

[24]  Marc-André Hermanns,et al.  Verifying Causal Connections between Distant Performance Phenomena in Large-Scale Message-Passing Applications , 2008 .

[25]  Allen D. Malony,et al.  Design and implementation of a parallel performance data management framework , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[26]  Franco Zambonelli,et al.  Diffusive load-balancing policies for dynamic applications , 1999, IEEE Concurr..

[27]  Allen D. Malony,et al.  ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis , 2003, Euro-Par.

[28]  Emery D. Berger,et al.  A locality-improving dynamic memory allocator , 2005, MSP '05.