Landing stencil code on Godson-T
暂无分享,去创建一个
[1] Saurabh Dighe,et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[2] Guang R. Gao,et al. Mapping the LU decomposition on a many-core architecture: challenges and solutions , 2009, CF '09.
[3] William J. Dally,et al. Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[4] Uday Bondhugula,et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories , 2008, PPoPP.
[5] Guang R. Gao,et al. Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures , 2007, ISCA '07.
[6] William J. Dally,et al. The message-driven processor: a multicomputer processing node with efficient mechanisms , 1992, IEEE Micro.
[7] Huang He. Architecture Supported Synchronization-Based Cache Coherence Protocol for Many-Core Processors , 2009 .
[8] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[9] Donald Yeung,et al. Low-Cost Support for Fine-Grain Synchronization in Multiprocessors , 1992, Multithreaded Computer Architecture.
[10] Henry P. Moreton,et al. The GeForce 6800 , 2005, IEEE Micro.
[11] Guang R. Gao,et al. Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences , 2006, Euro-Par.
[12] Burton J. Smith,et al. The architecture of HEP , 1985 .
[13] Pradeep Dubey,et al. Platform 2015: Intel ® Processor and Platform Evolution for the Next Decade , 2005 .
[14] Allan Porterfield,et al. The Tera computer system , 1990 .
[15] H. Peter Hofstee,et al. Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.
[16] Dongrui Fan,et al. A Performance Model of Dense Matrix Operations on Many-Core Architectures , 2008, Euro-Par.
[17] Samuel Williams,et al. Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.
[18] Jung Ho Ahn,et al. Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[19] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[20] Long Chen,et al. Performance Tuning of the Fast Fourier Transform on a Multi-core Architecture , 2008 .
[21] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[22] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[23] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[24] Chau-Wen Tseng,et al. Compiler optimizations for eliminating barrier synchronization , 1995, PPOPP '95.
[25] Sanjay V. Rajopadhye,et al. Towards Optimal Multi-level Tiling for Stencil Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[26] Dominique Lavenier,et al. Efficient Parallelization of a Protein Sequence Comparison Algorithm on Manycore Architecture , 2008, 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies.
[27] Volker Strumpen,et al. The memory behavior of cache oblivious stencil computations , 2007, The Journal of Supercomputing.
[28] Guang R. Gao,et al. Experience on optimizing irregular computation for memory hierarchy in manycore architecture , 2008, ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming.
[29] William J. Dally,et al. The message-driven processor , 1992 .
[30] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[31] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[32] William J. Dally. Computer Architecture in the Many-Core Era , 2006, 2006 International Conference on Computer Design.