TMFab: A Transactional Memory Fabric for Chip Multiprocessors

With the performance of single-core processors approaching its limits, an increased amount of research effort is focused on chip multiprocessors (CMP). However, existing lock-based synchronization methods that are critical to performing parallel computation possess limited scalability and are inherently complex to use while programming. This thesis uses the concept of transactional memory implemented within a synthesizable fabric named TMFab, containing all the requisite hardware components needed to prototype a scalable chip-multiprocessor. Its processor independent nature enables the instantiation and use of any suitable soft-processor core inside the fabric without significant modifications to the fabric hardware. Additionally, the fabric offers scalability on account of its 3D interconnect architecture that supports die-stacking to add additional processor cores to the CMP without increasing its area footprint. The hardware transactional memory system of the fabric reduces performance overheads of transactional operations, allowing transactions to complete execution faster. TMFab is shown to provide speed up as high as 3.44x for correctly partitioned independent transactions and can be used to analyze the points of contention for conflicting transactions. The fabric was synthesized for both Field Programmable Gate Array (FPGA) as well as 90nm semi-custom targets.

[1]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[2]  James R. Goodman,et al.  Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.

[3]  Kunle Olukotun,et al.  ATLAS: A Chip-Multiprocessor with Transactional Memory Support , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[4]  Paul D. Franzon,et al.  Creating 3D specific systems: Architecture, design and CAD , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[5]  Christoforos E. Kozyrakis,et al.  RAMP: Research Accelerator for Multiple Processors , 2007, IEEE Micro.

[6]  Seung Wook Yoon,et al.  3D TSV processes and its assembly/packaging technology , 2009, 2009 IEEE International Conference on 3D System Integration.

[7]  Thomas F. Knight An architecture for mostly functional languages , 1986, LFP '86.

[8]  Mark Moir,et al.  Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.

[9]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[10]  Maurice Herlihy,et al.  Virtualizing transactional memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[11]  Aleksandar Milenkovic,et al.  Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite , 2004, ACM-SE 42.

[12]  Luca Benini,et al.  Contrasting a NoC and a Traditional Interconnect Fabric with Layout Awareness , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[13]  Marc Lupon,et al.  Version management alternatives for hardware transactional memory , 2008, MEDEA '08.

[14]  Bratin Saha,et al.  McRT-STM: a high performance software transactional memory system for a multi-core runtime , 2006, PPoPP '06.

[15]  Christian Scheideler,et al.  Universal Continuous Routing Strategies , 1998, Theory of Computing Systems.

[16]  Mark Moir,et al.  Hybrid transactional memory , 2006, ASPLOS XII.

[17]  Mateo Valero Cortés,et al.  Hybrid transactional memory to accelerate safe lock-based transactions , 2008 .

[18]  Marc Tremblay,et al.  Rock: A High-Performance Sparc CMT Processor , 2009, IEEE Micro.

[19]  Krisztián Flautner,et al.  PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor , 2006, ASPLOS XII.

[20]  Asit Dan,et al.  An approximate analysis of the LRU and FIFO buffer replacement schemes , 1990, SIGMETRICS '90.

[21]  Craig B. Zilles,et al.  Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional Memory , 2008, 2008 International Symposium on Computer Architecture.

[22]  Maurice Herlihy,et al.  Software transactional memory for dynamic-sized data structures , 2003, PODC '03.

[23]  Gabriel H. Loh,et al.  3D-Integrated SRAM Components for High-Performance Microprocessors , 2009, IEEE Transactions on Computers.

[24]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[25]  Kaustav Banerjee,et al.  A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[26]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[27]  Axel Jantsch,et al.  Scalability of network-on-chip communication architecture for 3-D meshes , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[28]  Robert S. Patti,et al.  Three-Dimensional Integrated Circuits and the Future of System-on-Chip Designs , 2006, Proceedings of the IEEE.

[29]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[30]  P. Stravers,et al.  Homogeneous multiprocessing and the future of silicon design paradigms , 2001, 2001 International Symposium on VLSI Technology, Systems, and Applications. Proceedings of Technical Papers (Cat. No.01TH8517).

[31]  Maged M. Michael,et al.  Software Transactional Memory: Why Is It Only a Research Toy? , 2008, ACM Queue.

[32]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[33]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[34]  Chita R. Das,et al.  MIRA: A Multi-layered On-Chip Interconnect Router Architecture , 2008, 2008 International Symposium on Computer Architecture.

[35]  Luca Benini,et al.  Networks on chip: a new paradigm for systems on chip design , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[36]  Luca Benini,et al.  Supporting vertical links for 3D networks-on-chip: toward an automated design and analysis flow , 2007, Nano-Net.

[37]  Eby G. Friedman,et al.  3-D Topologies for Networks-on-Chip , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[38]  René van Leuken,et al.  MB-LITE: A robust, light-weight soft-core implementation of the MicroBlaze architecture , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[39]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[40]  Bradley C. Kuszmaul,et al.  Unbounded Transactional Memory , 2005, HPCA.

[41]  Bart Swinnen,et al.  3D System Integration Technologies , 2007, ICICDT 2007.

[42]  Nir Shavit,et al.  Transactional Locking II , 2006, DISC.