OpenCL-based design methodology for application-specific processors
暂无分享,去创建一个
Jarmo Takala | Pekka Jääskeläinen | Pablo Huerta | Carlos S. de La Lama | P. Jääskeläinen | J. Takala | P. Huerta | C. S. D. L. Lama
[1] Jarmo Takala,et al. Reducing processor energy consumption by compiler optimization , 2009, 2009 IEEE Workshop on Signal Processing Systems.
[2] Vivek Tiwari,et al. Reducing power in high-performance microprocessors , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).
[3] Henk Corporaal,et al. Register file port requirements of transport triggered architectures , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.
[4] Vivek Sarkar,et al. Linear scan register allocation , 1999, TOPL.
[5] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[6] Jarmo Takala,et al. Programmable and Scalable Architecture for Graphics Processing Units , 2009, SAMOS.
[7] Guido Bertoni,et al. Efficient Software Implementation of AES on 32-Bit Platforms , 2002, CHES.
[8] Frederico Pratas,et al. Applying the Stream-Based Computing Model to Design Hardware Accelerators: A Case Study , 2009, SAMOS.
[9] Lawrence Rauchwerger,et al. Automatic Detection of Parallelism: A grand challenge for high performance computing , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.
[10] Frances E. Allen,et al. Control-flow analysis , 2022 .
[11] Dennis Ritchie,et al. The development of the C language , 1993, HOPL-II.
[12] Ken Kennedy,et al. Conversion of control dependence to data dependence , 1983, POPL '83.
[13] Scott A. Mahlke,et al. Predicate-aware scheduling: a technique for reducing resource constraints , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[14] Henk Corporaal,et al. TTAs: Missing the ILP complexity wall , 1999, J. Syst. Archit..
[15] Jason Cong,et al. FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs , 2009, 2009 IEEE 7th Symposium on Application Specific Processors.
[16] Henk Corporaal,et al. Partitioned register file for TTAs , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[17] Henk Corporaal. Microprocessor architectures - from VLIW to TTA , 1997 .
[18] Jarmo Takala,et al. Codesign toolset for application-specific instruction-set processors , 2007, Electronic Imaging.
[19] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[20] Henk Corporaal,et al. Automatic Synthesis of Transport Triggered Processors , 1995 .
[21] Wen-mei W. Hwu,et al. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs , 2008, LCPC.