A pattern for efficient parallel computation on multicore processors with scalar operand networks
暂无分享,去创建一个
[1] David R. Martinez,et al. High Performance Embedded Computing Handbook , 2007 .
[2] Henry Hoffmann,et al. A stream compiler for communication-exposed architectures , 2002, ASPLOS X.
[3] Karthikeyan Sankaralingam,et al. A design space evaluation of grid processor architectures , 2001, MICRO.
[4] B. Ramakrishna Rau,et al. Instruction-level parallel processing: History, overview, and perspective , 2005, The Journal of Supercomputing.
[5] R. Brent,et al. Computation of the Singular Value Decomposition Using Mesh-Connected Processors , 1983 .
[6] Thomas R. Gross,et al. Compilation for a high-performance systolic array , 1986, SIGPLAN '86.
[7] Arnold L. Rosenberg,et al. Work-preserving emulations of fixed-connection networks , 1989, STOC '89.
[8] H. T. Kung. Systolic communication , 1988, [1988] Proceedings. International Conference on Systolic Arrays.
[9] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.
[10] Henry Hoffmann,et al. On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.
[11] H. T. Kung,et al. Warp architecture and implementation , 1998, ISCA '98.
[12] James E. Smith,et al. Decoupled access/execute computer architectures , 1984, TOCS.
[13] 36th International Symposium on Microarchitecture , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[14] Henry Hoffmann,et al. Stream Algorithms and Architecture , 2004, J. Instr. Level Parallelism.
[15] H. T. Kung. Warp experience: we can map computations onto a parallel computer efficiently , 1988, ICS '88.
[16] Kurt Keutzer,et al. A design pattern language for engineering (parallel) software: merging the PLPP and OPL projects , 2010, ParaPLoP '10.
[17] Henry Hoffmann,et al. Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[18] Sivan Toledo,et al. A survey of out-of-core algorithms in numerical linear algebra , 1999, External Memory Algorithms.
[19] Onur Mutlu,et al. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.
[20] John G. McWhirter,et al. From Bit Level Systolic Arrays to HDTV Processor Chips , 2006, ASAP.
[21] T. Gross,et al. !Warp-anatomy of a parallel computing system , 1999, IEEE Concurrency.
[22] F. Leighton,et al. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .
[23] Henry Hoffmann,et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.
[24] W. Daniel Hillis,et al. The connection machine , 1985 .
[25] Anant Agarwal,et al. Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[26] David E. Foulser,et al. The Saxpy Matrix-1: A General-Purpose Systolic Computer , 1987, Computer.
[27] Venkatesh Akella,et al. Synchroscalar: a multiple clock domain, power-aware, tile-based embedded processor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[28] H. T. Kung,et al. Architecture of the PSC-a programmable systolic chip , 1983, ISCA '83.
[29] K. Keutzer,et al. Our Pattern Language ( OPL ) : A Design Pattern Language for Engineering ( Parallel ) Software , 2009 .
[30] Christopher Batten,et al. The vector-thread architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[31] Thomas R. Gross,et al. Communication styles for parallel systems , 1994, Computer.
[32] Sun-Yuan Kung,et al. WAVEFRONT ARRAY PROCESSOR: ARCHITECTURE, LANGUAGE AND APPLICATIONS. , 1982 .