Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE
暂无分享,去创建一个
Xizhou Feng | Kirk W. Cameron | Dimitrios S. Nikolopoulos | Filip Blagojevic | K. Cameron | Xizhou Feng | F. Blagojevic
[1] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[2] Maya Gokhale,et al. Partitioning Hardware and Software for Reconfigurable Supercomputing Applications: A Case Study , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[3] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[4] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[5] Samuel Williams,et al. The potential of the cell processor for scientific computing , 2005, CF '06.
[6] Milind Girkar,et al. The hierarchical task graph as a universal intermediate representation , 2007, International Journal of Parallel Programming.
[7] Michael I. Gordon,et al. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.
[8] Yuan Zhao,et al. Dependence-Based Code Generation for a CELL Processor , 2006, LCPC.
[9] Jaspal Subhlok,et al. Optimal Use of Mixed Task and Data Parallelism for Pipelined Computations , 2000, J. Parallel Distributed Comput..
[10] P.H. Worley,et al. Early Evaluation of the Cray X1 , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[11] Géraud Krawezik,et al. Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors , 2003, SPAA '03.
[12] Eduard Ayguadé,et al. Exploiting multiple levels of parallelism in OpenMP: a case study , 1999, Proceedings of the 1999 International Conference on Parallel Processing.
[13] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[14] G ValiantLeslie. A bridging model for parallel computation , 1990 .
[15] Csaba Andras Moritz,et al. LoGPC: modeling network contention in message-passing programs , 1998, SIGMETRICS '98/PERFORMANCE '98.
[16] Alexandros Stamatakis,et al. RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[17] Michael Gschwind,et al. Optimizing Compiler for the CELL Processor , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[18] Rosa M. Badia,et al. CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[19] Sadaf R. Alam,et al. Early evaluation of the Cray XT3 , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[20] Fabrizio Petrini,et al. Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[21] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[22] X. Feng,et al. PBPI: a High Performance Implementation of Bayesian Phylogenetic Inference , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[23] Roger D. Chamberlain,et al. Highly-Scalable Reconfigurable Computing , 2005 .
[24] Robert Kroeger,et al. A case study in top-down performance estimation for a large-scale parallel application , 2006, PPoPP '06.
[25] Peter M. Athanas,et al. Examining the Viability of FPGA Supercomputing , 2007, EURASIP J. Embed. Syst..
[26] Phillip B. Gibbons. A more practical PRAM model , 1989, SPAA '89.
[27] Kirk W. Cameron,et al. Quantifying locality effect in data access delay: memory logP , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[28] P. Hanrahan,et al. Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[29] Thomas Rauber,et al. Library Support for Hierarchical Multi-Processor Tasks , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[30] Xizhou Feng,et al. Parallel algorithms for Bayesian phylogenetic inference , 2003, J. Parallel Distributed Comput..
[31] Peng-Jun Wan,et al. A Parallel Computational Model for Heterogeneous Clusters , 2006 .
[32] A Reconfigurable Computing Model for Biological Research Application of Smith-Waterman Analysis to Bacterial Genomes , 2003 .
[33] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[34] Csaba Andras Moritz,et al. LoGPC: Modeling Network Contention in Message-Passing Programs , 2001, IEEE Trans. Parallel Distributed Syst..
[35] Alexandros Stamatakis,et al. Dynamic multigrain parallelization on the cell broadband engine , 2007, PPoPP.
[36] Kathryn M. O'Brien,et al. Optimizing the Use of Static Buffers for DMA on a CELL Chip , 2006, LCPC.
[37] Guy E. Blelloch,et al. Implementation of a portable nested data-parallel language , 1993, PPOPP '93.
[38] William Gropp,et al. Reproducible Measurements of MPI Performance Characteristics , 1999, PVM/MPI.
[39] Fumihiko Ino,et al. LogGPS: a parallel computational model for synchronization analysis , 2001, PPoPP '01.
[40] Gerhard Goos,et al. Open Hypermedia Systems and Structural Computing , 2002, Lecture Notes in Computer Science.
[41] Guy E. Blelloch,et al. Implementation of a portable nested data-parallel language , 1993, PPOPP '93.
[42] Jaspal Subhlok,et al. A new model for integrated nested task and data parallel programming , 1997, PPOPP '97.
[43] Xizhou Feng,et al. Building the Tree of Life on Terascale Systems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[44] Franck Cappello,et al. MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks , 2000, ACM/IEEE SC 2000 Conference (SC'00).