Comparing FPGA vs. custom cmos and the impact on processor microarchitecture
暂无分享,去创建一个
[1] J.B. Kuang,et al. The design and implementation of double-precision multiplier in a first-generation CELL processor , 2005, 2005 International Conference on Integrated Circuit Design and Technology, 2005. ICICDT 2005..
[2] Vaughn Betz,et al. The Stratix II logic and routing architecture , 2005, FPGA '05.
[3] J. Gregory Steffan,et al. The microarchitecture of FPGA-based soft processors , 2005, CASES '05.
[4] A.J. Al-Khalili,et al. Performance of Parallel Prefix Adders implemented with FPGA technology , 2007, 2007 IEEE Northeast Workshop on Circuits and Systems.
[5] Belliappa Kuttanna,et al. A Sub-2 W Low Power IA Processor for Mobile Internet Devices in 45 nm High-k Metal Gate CMOS , 2009, IEEE Journal of Solid-State Circuits.
[6] K. Pagiamtzis,et al. Content-addressable memory (CAM) circuits and architectures: a tutorial and survey , 2006, IEEE Journal of Solid-State Circuits.
[7] P. Bai,et al. An advanced low power, high performance, strained channel 65nm technology , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..
[8] Yuen H. Chan,et al. IBM POWER6 SRAM arrays , 2007, IBM J. Res. Dev..
[9] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[10] Paul Metzgen,et al. A high performance 32-bit ALU for programmable logic , 2004, FPGA '04.
[11] Jean-Louis Brelet,et al. Using Virtex-II Block RAM for High Performance Read/Write CAMs , 2002 .
[12] Xiao Yan Zhang,et al. A 270ps 20mW 108-bit End-around Carry Adder for Multiply-Add Fused Floating Point Unit , 2010, J. Signal Process. Syst..
[13] C.C. Chen,et al. 65nm CMOS high speed, general purpose and low power transistor technology for high volume foundry application , 2004, Digest of Technical Papers. 2004 Symposium on VLSI Technology, 2004..
[14] Amir Roth,et al. Mini-graph processing , 2008 .
[15] Allan Hartstein,et al. The optimum pipeline depth for a microprocessor , 2002, ISCA.
[16] Sanu Mathew,et al. A 9-GHz 65-nm Intel® Pentium 4 Processor Integer Execution Unit , 2007, IEEE J. Solid State Circuits.
[17] Michael Zhang,et al. Highly-Associative Caches for Low-Power Processors , 2000 .
[18] J.D. Meindl,et al. Optimal interconnection circuits for VLSI , 1985, IEEE Transactions on Electron Devices.
[19] Jian Wang,et al. Godson-3: A Scalable Multicore RISC Processor with x86 Emulation , 2009, IEEE Micro.
[20] G. Palumbo,et al. Interconnect-Aware Design of Fast Large Fan-In CMOS Multiplexers , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.
[21] A. Kumar,et al. Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip , 2008, IEEE Journal of Solid-State Circuits.
[22] Jonathan Rose,et al. Measuring the Gap Between FPGAs and ASICs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[23] David G. Chinnery,et al. Closing the Power Gap between ASIC and Custom - Tools and Techniques for Low Power Design , 2005 .
[24] Eric Sprangle,et al. Increasing processor performance by implementing deeper pipelines , 2002, ISCA.
[25] Mateo Valero,et al. A decoupled KILO-instruction processor , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[26] David G. Chinnery,et al. Closing the Gap Between ASIC and Custom - Tools and Techniques for High-Performance ASIC Design , 2002 .
[27] Norman P. Jouppi,et al. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays , 2002, ISCA.
[28] Peter G. Sassone,et al. Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[29] Stamatis Vassiliadis,et al. High-Performance 3-1 Interlock Collapsing ALU's , 1994, IEEE Trans. Computers.
[30] S SohiGurindar. Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .
[31] Stratix II Device Handbook, Volume 1 , 2006 .
[32] D. Jamsek,et al. An 8GHz floating-point multiply , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..
[33] R. Krishnamurthy,et al. An 8.8GHz 198mW 16x64b 1R/1W variationtolerant register file in 65nm CMOS , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.
[34] S. Hsu,et al. A 110 GOPS/W 16-bit multiplier and reconfigurable PLA loop in 90-nm CMOS , 2005, IEEE Journal of Solid-State Circuits.
[35] Igor Arsovski,et al. Self-referenced sense amplifier for across-chip-variation immune sensing in high-performance Content-Addressable Memories , 2006, IEEE Custom Integrated Circuits Conference 2006.
[36] Hong Wang,et al. Intel® atom™ processor core made FPGA-synthesizable , 2009, FPGA '09.
[37] Leland Chang,et al. A 5.3GHz 8T-SRAM with Operation Down to 0.41V in 65nm CMOS , 2007, 2007 IEEE Symposium on VLSI Circuits.
[38] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[39] Xiang Zou,et al. Intel nehalem processor core made FPGA synthesizable , 2010, FPGA.
[40] Azita Emami-Neyestanak,et al. Tertiary-Tree 12-GHz 32-bit Adder in 65nm Technology , 2007, 2007 IEEE International Symposium on Circuits and Systems.
[41] R. J. Joenk,et al. IBM journal of research and development: information for authors , 1978 .
[42] B. Nikolic,et al. A 240ps 64b carry-lookahead adder in 90nm CMOS , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.
[43] R. Chau,et al. A 45nm Logic Technology with High-k+Metal Gate Transistors, Strained Silicon, 9 Cu Interconnect Layers, 193nm Dry Patterning, and 100% Pb-free Packaging , 2007, 2007 IEEE International Electron Devices Meeting.
[44] P. Bai,et al. A 65nm logic technology featuring 35nm gate lengths, enhanced channel strain, 8 Cu interconnect layers, low-k ILD and 0.57 /spl mu/m/sup 2/ SRAM cell , 2004, IEDM Technical Digest. IEEE International Electron Devices Meeting, 2004..
[45] J. Rose,et al. Mapping multiplexers onto hard multipliers in FPGAs , 2005, The 3rd International IEEE-NEWCAS Conference, 2005..
[46] R.K. Krishnamurthy,et al. A 9-GHz 65-nm Intel® Pentium 4 Processor Integer Execution Unit , 2006, IEEE Journal of Solid-State Circuits.
[47] Michael C. Huang,et al. SEED: Scalable, efficient enforcement of dependences , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[48] Shih-Lien Lu,et al. An FPGA-based Pentium® in a complete desktop system , 2007, FPGA '07.
[49] M. Khellah,et al. A 4.2GHz 0.3mm2 256kb Dual-V/sub cc/ SRAM Building Block in 65nm CMOS , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.
[50] J. Gregory Steffan,et al. Efficient multi-ported memories for FPGAs , 2010, FPGA '10.
[51] Rajesh Kumar,et al. A family of 45nm IA processors , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.
[52] D. Plass,et al. A 5.6GHz 64kB Dual-Read Data Cache for the POWER6TM Processor , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.
[53] Stamatis Vassiliadis,et al. Interlock collapsing ALU for increased instruction-level parallelism , 1992, MICRO.
[54] Andrew R. Pleszkun,et al. Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.
[55] Himanshu Kaul,et al. A Dual-Supply 4GHz 13fJ/bit/search 64×128b CAM in 65nm CMOS , 2006, 2006 Proceedings of the 32nd European Solid-State Circuits Conference.
[56] Kieran McLaughlin,et al. Exploring CAM Design For Network Processing Using FPGA Technology , 2006, Advanced Int'l Conference on Telecommunications and Int'l Conference on Internet and Web Applications and Services (AICT-ICIW'06).
[57] Paul Metzgen,et al. Multiplexer restructuring for FPGA implementation cost reduction , 2005, Proceedings. 42nd Design Automation Conference, 2005..