Thermal-Aware Design and Runtime Management of 3D Stacked Multiprocessors
暂无分享,去创建一个
[1] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[2] Oded Lempel,et al. 2nd Generation Intel® Core Processor Family: Intel® Core i7, i5 and i3 , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[3] Diana Marculescu,et al. Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).
[4] Gabriel H. Loh,et al. Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[5] Kees van Berkel,et al. Multi-core for mobile phones , 2009, DATE.
[6] Alan Gray,et al. Deterministic Parallel Processing , 2006, International Journal of Parallel Programming.
[7] Donald Yeung,et al. THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .
[8] Kofi A. A. Makinwa,et al. A 0.008-mm2 area-optimized thermal-diffusivity-based temperature sensor in 160-nm CMOS for SoC thermal monitoring , 2014, ESSCIRC 2014 - 40th European Solid State Circuits Conference (ESSCIRC).
[9] Kevin Skadron,et al. Temperature-aware microarchitecture: Modeling and implementation , 2004, TACO.
[10] Shorin Kyo,et al. In-vehicle vision processors for driver assistance systems , 2008, 2008 Asia and South Pacific Design Automation Conference.
[11] R. S. Jagtap,et al. A Methodology for Early Exploration of TSV Interconnects in 3D Stacked ICs , 2011 .
[12] Chin-Chung Tsai,et al. A time-to-digital-converter-based CMOS smart temperature sensor , 2005, 2005 IEEE International Symposium on Circuits and Systems.
[13] W. Van Teijlingen,et al. Determining Performance Boundaries and Automatic Loop Optimization of High-Level System Specifications , 2014 .
[14] Anant Agarwal,et al. rMPI: Message Passing on Multicore Processors with On-Chip Interconnect , 2008, HiPEAC.
[15] José Ignacio Hidalgo,et al. 3D thermal-aware floorplanner using a MOEA approximation , 2013, Integr..
[16] Stephen W. Keckler,et al. Regional congestion awareness for load balance in networks-on-chip , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[17] Huaxi Gu,et al. DTBR: A dynamic thermal-balance routing algorithm for Network-on-Chip , 2012, Comput. Electr. Eng..
[18] Nikolas Ioannou,et al. Phase-Based Application-Driven Hierarchical Power Management on the Single-chip Cloud Computer , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[19] José González,et al. Thermal-aware clustered microarchitectures , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..
[20] Terrence S. T. Mak,et al. Thermal Optimization in Network-on-Chip-Based 3D Chip Multiprocessors Using Dynamic Programming Networks , 2014, ACM Trans. Embed. Comput. Syst..
[21] Amir Zjajo,et al. A 11 µW 0°C–160°C temperature sensor in 90 nm CMOS for adaptive thermal monitoring of VLSI circuits , 2012, 2012 IEEE International Symposium on Circuits and Systems.
[22] Jason Cong,et al. An automated design flow for 3D microarchitecture evaluation , 2006, Asia and South Pacific Conference on Design Automation, 2006..
[23] Paul D. Franzon,et al. Thermal Pathfinding for 3-D ICs , 2014, IEEE Transactions on Components, Packaging and Manufacturing Technology.
[24] Anton Bakker. CMOS smart temperature sensors - an overview , 2002, Proceedings of IEEE Sensors.
[25] Sri Parameswaran,et al. HitME: Low power Hit MEmory buffer for embedded systems , 2009, 2009 Asia and South Pacific Design Automation Conference.
[26] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[27] An-Yeu Wu,et al. Traffic-and thermal-aware routing for throttled three-dimensional Network-on-Chip systems , 2011, Proceedings of 2011 International Symposium on VLSI Design, Automation and Test.
[28] Michael L. Scott,et al. Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[29] Timothy G. Mattson,et al. Light-weight communications on Intel's single-chip cloud computer processor , 2011, OPSR.
[30] H. Kufluoglu,et al. A Computational Model of NBTI and Hot Carrier Injection Time-Exponents for MOSFET Reliability , 2004 .
[31] Mateo Valero,et al. Software management of selective and dual data caches , 1997 .
[32] Coniferous softwood. GENERAL TERMS , 2003 .
[33] Saurabh Dighe,et al. The 48-core SCC Processor: the Programmer's View , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[34] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[35] Ankush Varma. High-Speed Performance, Power and Thermal Co-simulation For SoC Design , 2007 .
[36] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .
[37] Amir Zjajo,et al. Dynamic Thermal Estimation Methodology for High-Performance 3-D MPSoC , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[38] Kevin Kai-Wei Chang,et al. HAT: Heterogeneous Adaptive Throttling for On-Chip Networks , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.
[39] E. Beyne,et al. Numerical and experimental characterization of the thermal behavior of a packaged DRAM-on-logic stack , 2012, 2012 IEEE 62nd Electronic Components and Technology Conference.
[40] Fabien Clermidy,et al. 3D Embedded multi-core: Some perspectives , 2011, 2011 Design, Automation & Test in Europe.
[41] Alberto Ros,et al. Distance-aware round-robin mapping for large NUCA caches , 2009, 2009 International Conference on High Performance Computing (HiPC).
[42] William J. Dally. Virtual-channel flow control , 1990, ISCA '90.
[43] Scott Hauck,et al. FPGA vs. MPPA for Positron Emission Tomography pulse processing , 2009, 2009 International Conference on Field-Programmable Technology.
[44] Paul D. Franzon,et al. Creating 3D specific systems: Architecture, design and CAD , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).
[45] Paul Marchal,et al. Flexible hardware/software support for message passing on a distributed shared memory architecture , 2005, Design, Automation and Test in Europe.
[46] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.
[47] Kevin Skadron,et al. HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[48] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[49] David Atienza,et al. Thermal analysis and active cooling management for 3D MPSoCs , 2011, ISCAS.
[50] Paul Chow,et al. TMD-MPI: An MPI Implementation for Multiple Processors Across Multiple FPGAs , 2006, 2006 International Conference on Field Programmable Logic and Applications.
[51] Laxmikant V. Kalé,et al. Topology-aware task mapping for reducing communication contention on large parallel machines , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[52] Lei Jiang,et al. Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[53] Luca Benini,et al. Exploring "temperature-aware" design in low-power MPSoCs , 2006, DATE.
[54] David Blaauw,et al. Nanometer Device Scaling in Subthreshold Circuits , 2007, 2007 44th ACM/IEEE Design Automation Conference.
[55] Alexander V. Veidenbaum,et al. Revisiting level-0 caches in embedded processors , 2012, CASES '12.
[56] Andrew B. Kahng,et al. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[57] Jean-Luc Gaudiot,et al. Throttling-Based Resource Management in High Performance Multithreaded Architectures , 2006, IEEE Transactions on Computers.
[58] Federico Angiolini,et al. Automated Pathfinding tool chain for 3D-stacked integrated circuits: Practical case study , 2009, 2009 IEEE International Conference on 3D System Integration.
[59] Todd M. Austin,et al. SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.
[60] Seda Ogrenci Memik,et al. Optimizing Thermal Sensor Allocation for Microprocessors , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[61] Edward A. Lee,et al. Dataflow process networks , 2001 .
[62] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[63] Philip Jacob,et al. Thermal Modeling of 3-D Stacked DRAM Over SiGe HBT BiCMOS CPU , 2015, IEEE Access.
[64] Zhiyi Yu,et al. A 167-Processor Computational Platform in 65 nm CMOS , 2009, IEEE Journal of Solid-State Circuits.
[65] Nanning Zheng,et al. 3D DRAM Design and Application to 3D Multicore Systems , 2009, IEEE Design & Test of Computers.
[66] Gerald H. Hilderink,et al. Parallel Processing — the picoChip way! , 2003 .
[67] Arvind Sridhar,et al. Thermal modeling and analysis of 3D multi-processor chips , 2010, Integr..
[68] Theocharis Theocharides,et al. Intelligent Hotspot Prediction for Network-on-Chip-Based Multicore Systems , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[69] Narayanan Vijaykrishnan,et al. Variation Impact on SER of Combinational Circuits , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).
[70] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[71] Florence Maraninchi,et al. Co-simulation of Functional SystemC TLM Models with Power/Thermal Solvers , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[72] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[73] Meeta Sharma Gupta,et al. System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[74] Robert C. Aitken,et al. Low Power Methodology Manual - for System-on-Chip Design , 2007 .
[75] Koen Bertels,et al. QUAD - A Memory Access Pattern Analyser , 2010, ARC.
[76] Amir Zjajo,et al. Physical characterization of steady-state temperature profiles in three-dimensional integrated circuits , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).
[77] Federico Angiolini,et al. QoS-ocMPI: QoS-aware on-chip Message Passing Library for NoC-based Many-Core MPSoCs , 2010 .
[78] Eby G. Friedman,et al. Thermal conduction path analysis in 3-D ICs , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).
[79] Amir Fijany,et al. Very low power parallel implementation of stereo vision algorithm on a solar cell powered MIMD many core architecture , 2011, 2011 Aerospace Conference.
[80] Tao Zhang,et al. A customized design of DRAM controller for on-chip 3D DRAM stacking , 2010, IEEE Custom Integrated Circuits Conference 2010.
[81] Petru Eles,et al. On-line thermal aware dynamic voltage scaling for energy optimization with frequency/temperature dependency consideration , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[82] Geoffrey Brown,et al. ρ-VEX: A reconfigurable and extensible softcore VLIW processor , 2008, 2008 International Conference on Field-Programmable Technology.
[83] Liam Madden. Heterogeneous 3-d stacking, can we have the best of both (technology) worlds? , 2013, ISPD '13.
[84] Bart Vandevelde,et al. Fine grain thermal modeling and experimental validation of 3D-ICs , 2011, Microelectron. J..
[85] S. S. Kumar. TMFab: A Transactional Memory Fabric for Chip Multiprocessors , 2010 .
[86] Tao Zhang,et al. 3D-SWIFT: a high-performance 3D-stacked wide IO DRAM , 2014, GLSVLSI '14.
[87] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[88] A. Michos. A Novel Concurrent Validation Scheme for Hardware Transactional Memory , 2012 .
[89] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[90] Gernot Heiser,et al. Slow Down or Sleep, That Is the Question , 2011, USENIX Annual Technical Conference.
[91] Sumeet S. Kumar,et al. A 3D Network-on-Chip for stacked-die transactional chip multiprocessors using Through Silicon Vias , 2011, 2011 6th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS).
[92] Kathryn S. McKinley,et al. Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.
[93] Saurabh Dighe,et al. A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling , 2011, IEEE Journal of Solid-State Circuits.
[94] Brent E. Nelson,et al. Comparing fine-grained performance on the Ambric MPPA against an FPGA , 2009, 2009 International Conference on Field Programmable Logic and Applications.
[95] Changkyu Kim,et al. Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches , 2003, IEEE Micro.
[96] David Atienza,et al. 3D Thermal-aware floorplanner for many-core single-chip systems , 2011, 2011 12th Latin American Test Workshop (LATW).
[97] Ed F. Deprettere,et al. Daedalus: Toward composable multimedia MP-SoC design , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[98] F. H. Mcmahon,et al. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .
[99] Margaret Martonosi,et al. Voltage and frequency control with adaptive reaction time in multiple-clock-domain processors , 2005, 11th International Symposium on High-Performance Computer Architecture.
[100] Seung Wook Yoon,et al. 3D TSV processes and its assembly/packaging technology , 2009, 2009 IEEE International Conference on 3D System Integration.
[101] Li Shang,et al. PowerHerd: a distributed scheme for dynamically satisfying peak-power constraints in interconnection networks , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[102] Lingjia Tang,et al. Contentiousness vs. sensitivity: improving contention aware runtime systems on multicore architectures , 2011, EXADAPT '11.
[103] Guoping Xu. Evaluation of a Liquid Cooling Concept for High Power Processors , 2007, Twenty-Third Annual IEEE Semiconductor Thermal Measurement and Management Symposium.
[104] Kai Ma,et al. Adaptive Power Control with Online Model Estimation for Chip Multiprocessors , 2011, IEEE Transactions on Parallel and Distributed Systems.
[105] T. Mohsenin,et al. A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling , 2008, 2008 IEEE Symposium on VLSI Circuits.
[106] Luca Benini,et al. HW-SW emulation framework for temperature-aware design in MPSoCs , 2008, TODE.
[107] Shorin Kyo,et al. IMAPCAR: A 100 GOPS In-Vehicle Vision Processor Based on 128 Ring Connected Four-Way VLIW Processing Elements , 2011, J. Signal Process. Syst..
[108] Margaret Martonosi,et al. An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[109] Shekhar Y. Borkar,et al. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.
[110] Jeongho Cho,et al. Test and debug strategy for TSMC CoWoS™ stacking process based heterogeneous 3D IC: A silicon case study , 2013, 2013 IEEE International Test Conference (ITC).
[111] J. W. McPherson,et al. Reliability challenges for 45nm and beyond , 2006, 2006 43rd ACM/IEEE Design Automation Conference.
[112] Ge-Ming Chiu,et al. The Odd-Even Turn Model for Adaptive Routing , 2000, IEEE Trans. Parallel Distributed Syst..
[113] Mike Butts,et al. Synchronization through Communication in a Massively Parallel Processor Array , 2007, IEEE Micro.
[114] Keiji Matsumoto,et al. Thermal resistance measurements of interconnections, for the investigation of the thermal resistance of a three-dimensional (3D) chip stack , 2009, 2009 25th Annual IEEE Semiconductor Thermal Measurement and Management Symposium.
[115] William H. Mangione-Smith,et al. The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[116] John Kubiatowicz,et al. Integrated shared-memory and message-passing communication in the Alewife multiprocessor , 1998 .
[117] Luca Benini,et al. A virtual platform environment for exploring power, thermal and reliability management control strategies in high-performance multicores , 2010, GLSVLSI '10.
[118] Pradip Bose,et al. Stretching the limits of clock-gating efficiency in server-class processors , 2005, 11th International Symposium on High-Performance Computer Architecture.
[119] Gabriel H. Loh,et al. 3D-Integrated SRAM Components for High-Performance Microprocessors , 2009, IEEE Transactions on Computers.
[120] Lieven Eeckhout,et al. Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics , 2006, 2006 IEEE International Symposium on Workload Characterization.
[121] Li Shang,et al. Three-Dimensional Chip-Multiprocessor Run-Time Thermal Management , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[122] Amir Zjajo,et al. System Level Methodology for Interconnect Aware and Temperature Constrained Power Management of 3-D MP-SOCs , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[123] P. Soussan,et al. Comprehensive analysis of the impact of single and arrays of through silicon vias induced stress on high-k / metal gate CMOS performance , 2010, 2010 International Electron Devices Meeting.
[124] David A. Padua,et al. Calculating stack distances efficiently , 2002, MSP/ISMM.
[125] Radhika Sanjeev Jagtap,et al. A Methodology for Early Exploration of TSV Placement Topologies in 3D Stacked ICs , 2012, 2012 15th Euromicro Conference on Digital System Design.
[126] David Atienza,et al. 3D-ICE: Fast compact transient thermal modeling for 3D ICs with inter-tier liquid cooling , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[127] Brad Budlong,et al. Reconfigurable Work Farms on a Massively Parallel Processor Array , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.
[128] Kristof Beyls,et al. Reuse Distance as a Metric for Cache Behavior. , 2001 .
[129] David A. Patterson,et al. Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .
[130] M. Nagata,et al. Limitations, innovations, and challenges of circuits and devices into a half micrometer and beyond , 1992 .
[131] A. Varma,et al. Selective victim caching: a method to improve the performance of direct-mapped caches , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.
[132] An-Yeu Wu,et al. Traffic- and Thermal-Aware Run-Time Thermal Management Scheme for 3D NoC Systems , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.
[133] J. De Klerk. Cache Balancer: A communication latency and utilization aware resource manager , 2014 .
[134] C. Feenstra. A Memory Access and Operator Usage Pro?ler Framework for HLS Optimization: Using the Lucas Optical Flow Algorithm as Case Study , 2011 .
[135] Robert S. Patti. Three-Dimensional Integrated Circuits and the Future of System-on-Chip Designs In 3D integrated circuits, analog, digital, flash and DRAM wafers are processed separately, then brought together in an integrated vertical stack. , 2006 .