Innovative Research and Applications in Next-Generation High Performance Computing

[1]  Luca Benini,et al.  Cycle-accurate simulation of energy consumption in embedded systems , 1999, DAC '99.

[2]  Norman P. Jouppi,et al.  A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors , 2003 .

[3]  Klaus D. McDonald-Maier,et al.  Data Cache-Energy and Throughput Models: Design Exploration for Embedded Processors , 2009, EURASIP J. Embed. Syst..

[4]  Andrew Brownsword,et al.  Hardware transactional memory for GPU architectures , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Maged M. Michael,et al.  Transactional memory support in the IBM POWER8 processor , 2015, IBM J. Res. Dev..

[6]  Josep Torrellas,et al.  Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[7]  Chi Cao Minh,et al.  Designing an effective hybrid Transactional Memory system , 2008 .

[8]  Sarma B. K. Vrudhula,et al.  Energy-Efficient Operation of Multicore Processors by DVFS, Task Migration, and Active Cooling , 2014, IEEE Transactions on Computers.

[9]  Kevin J. Nowka,et al.  Enhanced Leakage Reduction Techniques Using Intermediate Strength Power Gating , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Pradip Bose,et al.  A case for guarded power gating for multi-core processors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[11]  Maurice Herlihy,et al.  Virtualizing transactional memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[12]  Rupesh S. Shelar A Fast and Near-Optimal Clustering Algorithm for Low-Power Clock Tree Synthesis , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Marc Lupon Navazo Hardware Approaches for Transactional Memory , 2008 .

[14]  Sandeep K. Shukla,et al.  The Model Checking View to Clock Gating and Operand Isolation , 2010, 2010 10th International Conference on Application of Concurrency to System Design.

[15]  D. Geer,et al.  Chip makers turn to multicore processors , 2005, Computer.

[16]  James R. Goodman,et al.  Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.

[17]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[18]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[19]  Kevin E. Moore,et al.  Log-based transactional memory , 2007 .

[20]  Youngsoo Shin,et al.  Synthesis of clock gating logic through factored form matching , 2012, 2012 IEEE International Conference on IC Design & Technology.

[21]  K. Ghose,et al.  Modeling Energy Dissipation in Low Power Caches , 1998 .

[22]  Craig B. Zilles,et al.  Transactional memory and the birthday paradox , 2007, SPAA '07.

[23]  James R. Goodman,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.

[24]  Manuel E. Acacio,et al.  Dynamic Serialization: Improving Energy Consumption in Eager-Eager Hardware Transactional Memory Systems , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[25]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[26]  Rachid Guerraoui,et al.  Transactional Memory. Foundations, Algorithms, Tools, and Applications , 2015, Lecture Notes in Computer Science.

[27]  Nancy Hitschfeld-Kahler,et al.  A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures , 2014 .

[28]  Jan-Willem Maessen,et al.  Split hardware transactions: true nesting of transactions using best-effort hardware transactional memory , 2008, PPOPP.

[29]  Jinson Koppanalil,et al.  A 1.6 GHz dual-core ARM Cortex A9 implementation on a low power high-K metal gate 32nm process , 2011, Proceedings of 2011 International Symposium on VLSI Design, Automation and Test.

[30]  Osman S. Unsal,et al.  A Low-Overhead Profiling and Visualization Framework for Hybrid Transactional Memory , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[31]  Andrew B. Kahng,et al.  MAPG: Memory access power gating , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[32]  David B. Lomet,et al.  Process structuring, synchronization, and recovery using atomic actions , 1977, Language Design for Reliable Software.

[33]  Christos Kozyrakis,et al.  Architectures for transactional memory , 2009 .

[34]  Li Li,et al.  Automatic Register Transfer level CAD tool design for advanced clock gating and low power schemes , 2012, 2012 International SoC Design Conference (ISOCC).

[35]  Alvin M. Despain,et al.  Cache design trade-offs for power and performance optimization: a case study , 1995, ISLPED '95.

[36]  Manish Gupta,et al.  Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.

[37]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[38]  Rung-Bin Lin,et al.  Clock gating optimization with delay-matching , 2011, 2011 Design, Automation & Test in Europe.

[39]  Nabendu Chaki,et al.  A Lightweight Implementation of Obstruction-Free Software Transactional Memory , 2014, ACSS.

[40]  Massoud Pedram,et al.  Design of a Tri-Modal Multi-Threshold CMOS Switch With Application to Data Retentive Power Gating , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[41]  Anoop Gupta,et al.  A parallel adaptive fast multipole method , 1993, Supercomputing '93. Proceedings.

[42]  알리-레자 아들-타바타바이,et al.  Unbounded transactional memory systems , 2006 .

[43]  Maurice Herlihy Fun with hardware transactional memory , 2014, SIGMOD Conference.

[44]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[45]  David Blaauw,et al.  Millimeter-scale nearly perpetual sensor system with stacked battery and solar cells , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[46]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[47]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[48]  Mark Moir,et al.  The adaptive transactional memory test platform: a tool for experimenting with transactional code for rock (poster) , 2008, SPAA '08.

[49]  Amin Firoozshahian,et al.  Smart memories: A reconfigurable memory system architecture , 2009 .

[50]  Wei Wang,et al.  SeSCG: Selective sequential clock gating for ultra-low-power multimedia mobile processor design , 2010, 2010 IEEE International Conference on Electro/Information Technology.

[51]  Chung-Hsun Huang,et al.  A fast wake-up power gating technique with inducing a balanced rush current , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[52]  Hideharu Amano,et al.  Trade-off analysis of fine-grained power gating methods for functional units in a CPU , 2012, 2012 IEEE COOL Chips XV.

[53]  Jörg Henkel,et al.  A framework for estimation and minimizing energy dissipation of embedded HW/SW systems , 1998, DAC.

[54]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[55]  Hiroshi Nakamura,et al.  Stepwise sleep depth control for run-time leakage power saving , 2012, GLSVLSI '12.

[56]  Per Stenström,et al.  Eager Beats Lazy: Improving Store Management in Eager Hardware Transactional Memory , 2013, IEEE Transactions on Parallel and Distributed Systems.

[57]  Gerhard Wellein,et al.  Exploring performance and power properties of modern multi‐core chips via simple machine models , 2012, Concurr. Comput. Pract. Exp..

[58]  E. Boemo,et al.  Clock gating and clock enable for FPGA power reduction , 2012, 2012 VIII Southern Conference on Programmable Logic.

[59]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[60]  Masanori Hariyama,et al.  A Low-Power FPGA Based on Autonomous Fine-Grain Power Gating , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[61]  Samuel Naffziger,et al.  An x86-64 core implemented in 32nm SOI CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[62]  Ligang Hou,et al.  Clock gating -A power optimization technique for smart card , 2014, 2014 12th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT).

[63]  Thomas F. Knight An architecture for mostly functional languages , 1986, LFP '86.

[64]  Pradip Bose,et al.  Guarded power gating in a multi-core setting , 2010, ISCA'10.

[65]  Rahul M. Rao,et al.  Power optimization methodology for the IBM POWER7 microprocessor , 2011 .

[66]  Viktor Leis,et al.  Exploiting hardware transactional memory in main-memory databases , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[67]  Mark Moir,et al.  Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.

[68]  Steven A. Przybylski,et al.  Cache and memory hierarchy design: a performance-directed approach , 1990 .

[69]  Jindong Tan,et al.  RT-ROS: A real-time ROS architecture on multi-core processors , 2016, Future Gener. Comput. Syst..

[70]  Andrew Brownsword,et al.  Kilo TM: Hardware Transactional Memory for GPU Architectures , 2012, IEEE Micro.

[71]  Luca Benini,et al.  Clock-tree power optimization based on RTL clock-gating , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[72]  Maurice Herlihy Transactional Memory Today , 2010, ICDCIT.

[73]  Alice Wang,et al.  Adaptive Techniques for Dynamic Processor Optimization: Theory and Practice , 2008 .

[74]  David Blaauw,et al.  Sleep Mode Analysis and Optimization With Minimal-Sized Power Gating Switch for Ultra-Low ${V}_{\rm dd}$ Operation , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[75]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[76]  Anantha Chandrakasan,et al.  A Resolution-Reconfigurable 5-to-10-Bit 0.4-to-1 V Power Scalable SAR ADC for Sensor Applications , 2013, IEEE Journal of Solid-State Circuits.

[77]  David A. Wood,et al.  Supporting nested transactional memory in logTM , 2006, ASPLOS XII.

[78]  Klaus D. McDonald-Maier,et al.  Analytical Evaluation of Energy and Throughput for Multilevel Caches , 2010, 2010 12th International Conference on Computer Modelling and Simulation.

[79]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[80]  Shih-Hsu Huang,et al.  High-Level Synthesis for Minimum-Area Low-Power Clock Gating , 2012, J. Inf. Sci. Eng..

[81]  Krishnendu Chakrabarty,et al.  A Robust and Reconfigurable Multi-mode Power Gating Architecture , 2011, 2011 24th Internatioal Conference on VLSI Design.

[82]  Mateo Valero,et al.  A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality , 1995, International Conference on Supercomputing.

[83]  Victor Alessandrini Shared Memory Application Programming: Concepts and Strategies in Multicore Application Programming , 2015 .

[84]  Indrani Paul,et al.  A comparison of core power gating strategies implemented in modern hardware , 2014, SIGMETRICS '14.

[85]  Donald E. Porter,et al.  MetaTM/TxLinux: Transactional Memory for an Operating System , 2007, IEEE Micro.

[86]  Youngsoo Shin,et al.  Synthesis and implementation of active mode power gating circuits , 2010, Design Automation Conference.

[87]  David A. Wood,et al.  TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory , 2008, 2008 International Symposium on Computer Architecture.

[88]  Laurence Pierre,et al.  Runtime Verification of Typical Requirements for a Space Critical SoC Platform , 2011, FMICS.

[89]  Srinivas Katkoori,et al.  State-Retentive Power Gating of Register Files in Multicore Processors Featuring Multithreaded In-Order Cores , 2011, IEEE Transactions on Computers.

[90]  Chong-Min Kyung,et al.  Temperature-Aware Integrated DVFS and Power Gating for Executing Tasks With Runtime Distribution , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[91]  Bratin Saha,et al.  McRT-STM: a high performance software transactional memory system for a multi-core runtime , 2006, PPoPP '06.

[92]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[93]  Antony L. Hosking,et al.  Nested transactional memory: Model and architecture sketches , 2006, Sci. Comput. Program..

[94]  David A. Wood,et al.  Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.

[95]  Hiroshi Nakamura,et al.  Efficient leakage power saving by sleep depth controlling for Multi-mode Power Gating , 2012, Thirteenth International Symposium on Quality Electronic Design (ISQED).

[96]  Samuel Naffziger,et al.  Design and implementation of soft-edge flip-flops for x86-64 AMD microprocessor modules , 2012, Proceedings of the IEEE 2012 Custom Integrated Circuits Conference.

[97]  Tsai-Ming Hsieh,et al.  Clock tree construction using gated clock cloning , 2012, 2012 4th Asia Symposium on Quality Electronic Design (ASQED).