Multi-threading in Uni-threaded Processor
暂无分享,去创建一个
Francky Catthoor | Praveen Raghavan | Angeliki Kritikakou | Javed Absar | Andy Lambrechts | Murali Jayapala
[1] Diederik Verkest,et al. Software Simultaneous Multi-Threading, a Technique to Exploit Task-Level Parallelism to Improve Instruction- and Data-Level Parallelism , 2006, PATMOS.
[2] Erik Brockmeyer,et al. Systematic Preprocessing of Data Dependent Constructs for Embedded Systems , 2005, PATMOS.
[3] Stefanos Kaxiras,et al. Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads , 2001, CASES '01.
[4] Gustavo de Veciana,et al. Application-specific clustered VLIW datapaths: early exploration on a parameterized design space , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[5] Frank Vahid,et al. Synthesis of customized loop caches for core-based embedded systems , 2002, ICCAD 2002.
[6] Luis Piñuel,et al. Optimizing the memory bandwidth with loop morphing , 2004 .
[7] Kurt Keutzer,et al. Getting to the bottom of deep submicron II: a global wiring paradigm , 1999, ISPD '99.
[8] Wen-mei W. Hwu,et al. Enhancing loop buffering of media and telecommunications applications using low-overhead predication , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[9] Aviral Shrivastava,et al. An efficient compiler technique for code size reduction using reduced bit-width ISAs , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.
[10] Thomas M. Conte,et al. High-performance and low-cost dual-thread VLIW processor using Weld architecture paradigm , 2005, IEEE Transactions on Parallel and Distributed Systems.
[11] Henk Corporaal,et al. Clustered loop buffer organization for low energy VLIW embedded processors , 2005, IEEE Transactions on Computers.
[12] Sumedh W. Sathaye,et al. Instruction fetch mechanisms for VLIW architectures with compressed encodings , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[13] William J. Dally,et al. Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[14] Peter Marwedel,et al. Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).
[15] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[16] Sanjay V. Rajopadhye,et al. Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.
[17] Mahmut T. Kandemir,et al. Compiler-directed scratch pad memory optimization for embedded multiprocessors , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[18] Peter Marwedel,et al. Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.
[19] 小林 悠記. Low power design method for embedded systems using VLIW processor , 2007 .