Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
暂无分享,去创建一个
Christopher Batten | Krste Asanovic | Yunsup Lee | Derek Lockhart | Rimas Avizienis | Alex Bishara | Richard Xia | Alex Bishara | K. Asanović | Yunsup Lee | C. Batten | R. Xia | Rimas Avizienis | Derek Lockhart
[1] Vladimir M. Pentkovski,et al. Implementing Streaming SIMD Extensions on the Pentium III Processor , 2000, IEEE Micro.
[2] John Goodacre,et al. Parallelism and the ARM instruction set architecture , 2005, Computer.
[3] Karthikeyan Sankaralingam,et al. Universal Mechanisms for Data-Parallel Architectures , 2003, MICRO.
[4] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[5] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[6] Michael J. Flynn,et al. Very high-speed computing systems , 1966 .
[7] Krste Asanovic,et al. Compiling for vector-thread architectures , 2008, CGO '08.
[8] Ashok Kumar,et al. An 8-Core 64-Thread 64b Power-Efficient SPARC SoC , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[9] Philipp Slusallek,et al. RPU: a programmable ray processing unit for realtime ray tracing , 2005, ACM Trans. Graph..
[10] Hunter Scales,et al. AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.
[11] Werner Buchholz. The IBM System/370 Vector Architecture , 1986, IBM Syst. J..
[12] Christopher Batten,et al. Cache Refill/Access Decoupling for Vector Machines , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[13] Ozalp Babaoglu,et al. ACM Transactions on Computer Systems , 2007 .
[14] Marc Tremblay,et al. VIS speeds new media processing , 1996, IEEE Micro.
[15] Corinna G. Lee,et al. A Vectorizing SUIF Compiler , 1997 .
[16] Samuel Williams,et al. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .
[17] Sanjay J. Patel,et al. A Task-Centric Memory Model for Scalable Accelerator Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[18] Sanjay J. Patel,et al. Tradeoffs in designing accelerator architectures for visual computing , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[19] Martin Hopkins,et al. Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.
[20] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.
[21] Christoforos E. Kozyrakis,et al. Vector Lane Threading , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[22] Hiroaki Kobayashi,et al. Performance evaluation of NEC SX-9 using real science and engineering applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[23] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .
[24] P. Slusallek,et al. RPU: a programmable ray processing unit for realtime ray tracing , 2005, SIGGRAPH '05.
[25] Christopher Batten,et al. The vector-thread architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[26] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[27] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[28] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..
[29] Uri C. Weiser,et al. MMX technology extension to the Intel architecture , 1996, IEEE Micro.
[30] Sanjay J. Patel,et al. Rigel: an architecture and scalable programming interface for a 1000-core accelerator , 2009, ISCA '09.
[31] Christopher Batten,et al. Implementing the scale vector-thread processor , 2008, TODE.
[32] Hiroshi Tamura,et al. FACOM VP-100/200: Supercomputers with ease of use , 1985, Parallel Comput..
[33] Ronny Krashinsky. Vector-thread architecture and implementation , 2007 .
[34] Guy E. Blelloch,et al. Radix sort for vector multiprocessors , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[35] Christopher Batten,et al. Simplified vector-thread architectures for flexible and efficient data-parallel accelerators , 2010 .
[36] David F. Bacon,et al. Compiler transformations for high-performance computing , 1994, CSUR.
[37] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[38] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[39] H. P. Peterson,et al. A functional description of the Lincoln TX-2 computer , 1957, IRE-AIEE-ACM '57 (Western).
[40] Ruby B. Lee. Subword parallelism with MAX-2 , 1996, IEEE Micro.
[41] Tor M. Aamodt,et al. Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware , 2009, TACO.
[42] Yunsup Lee. Efficient VLSI Implementations of Vector-Thread Architectures , 2011 .
[43] Mateo Valero,et al. Decoupled vector architectures , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[44] Noah Treuhaft,et al. Scalable Processors in the Billion-Transistor Era: IRAM , 1997, Computer.
[45] James E. Smith,et al. Vector instruction set support for conditional operations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[46] Brian Kingsbury,et al. Spert-II: A Vector Microprocessor System , 1996, Computer.
[47] John Wawrzynek,et al. Vector microprocessors , 1998 .
[48] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[49] Steve Scott,et al. The Cray BlackWidow: a highly scalable vector multiprocessor , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).