Deconstructing the overhead in parallel applications
暂无分享,去创建一个
Alexandra Fedorova | Mark Roth | Micah J. Best | Craig Mustard | Alexandra Fedorova | Mark Roth | Craig Mustard
[1] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[2] Laxmi N. Bhuyan,et al. Thread reinforcer: Dynamically determining number of threads via OS level monitoring , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[3] Francisco J. Cazorla,et al. Optimal task assignment in multithreaded processors: a statistical approach , 2012, ASPLOS XVII.
[4] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[5] Ryan Johnson,et al. Decoupling contention management from scheduling , 2010, ASPLOS XV.
[6] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[7] Archana Ganapathi,et al. A case for machine learning to optimize multicore performance , 2009 .
[8] Yale N. Patt,et al. Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs , 2008, ASPLOS.
[9] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.
[10] Sally A. McKee,et al. Understanding PARSEC performance on contemporary CMPs , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[11] J. Ortega,et al. A multi-color SOR method for parallel computation , 1982, ICPP.
[12] Eric A. Brewer,et al. High-level optimization via automated statistical modeling , 1995, PPOPP '95.
[13] Stijn Eyerman,et al. Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[14] Thomas H. Dunigan. KENDALL SQUARE MULTIPROCESSOR: EARLY EXPERIENCES AND PERFORMANCE , 1992 .
[15] Mark Crovella,et al. Parallel performance using lost cycles analysis , 1994, SC.
[16] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[17] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[18] Robert Tappan Morris,et al. Locating cache performance bottlenecks using data profiling , 2010, EuroSys '10.