A Low-Overhead Profiling and Visualization Framework for Hybrid Transactional Memory

Multi-core prototyping presents a good opportunity for establishing low overhead and detailed profiling and visualization in order to study new research topics. In this paper, we design and implement a low execution, low area overhead profiling mechanism and a visualization tool for observing Transactional Memory behaviors on FPGA. To achieve this, we non-disruptively create and bring out events on the fly and process them offline on a host. There, our tool regenerates the execution from the collected events and produces traces for comprehensively inspecting the behavior of interacting multithreaded programs. With zero execution overhead for hardware TM events, single-instruction overhead for software TM events, and utilizing a low logic area of 2.3% per processor core, we run TM benchmarks to evaluate various different levels of profiling detail with an average runtime overhead of 6%. We demonstrate the usefulness of such detailed examination of SW/HW transactional behavior in two parts: (i) we speed up a TM benchmark by 24.1%, and (ii) we closely inspect transactions to point out pathologies.

[1]  Babak Falsafi,et al.  A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs , 2008, FPGA '08.

[2]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[3]  Mateo Valero,et al.  The limits of software transactional memory (STM): dissecting Haskell STM applications on a many-core environment , 2008, CF '08.

[4]  Joel Emer,et al.  Implementing a Functional / Timing Partitioned Microprocessor Simulator with an FPGA , 2006 .

[5]  Mark Moir,et al.  Hybrid transactional memory , 2006, ASPLOS XII.

[6]  Torvald Riegel,et al.  Dynamic performance tuning of word-based software transactional memory , 2008, PPoPP.

[7]  Mateo Valero,et al.  TMbox: A Flexible and Reconfigurable 16-Core Hybrid Transactional Memory System , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[8]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[9]  Mateo Valero,et al.  Discovering and understanding performance bottlenecks in transactional applications , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Kunle Olukotun,et al.  ATLAS: A Chip-Multiprocessor with Transactional Memory Support , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[11]  Mikel Luján,et al.  Profiling Transactional Memory Applications , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[12]  David A. Patterson,et al.  RAMP gold: An FPGA-based architecture simulator for multiprocessors , 2010, Design Automation Conference.

[13]  Mark L. Chang,et al.  Low-Cost Stereo Vision on an FPGA , 2007 .

[14]  Dam Sunwoo,et al.  FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators , 2007, MICRO.

[15]  Kunle Olukotun,et al.  Eigenbench: A simple exploration tool for orthogonal TM characteristics , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[16]  J. Gregory Steffan,et al.  Application-specific signatures for transactional memory in soft processors , 2011, TRETS.

[17]  Chen Chang,et al.  BEE3: Revitalizing Computer Architecture Research , 2009 .

[18]  Torvald Riegel,et al.  Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack , 2010, EuroSys '10.

[19]  Kunle Olukotun,et al.  TAPE: a transactional application profiling environment , 2005, ICS '05.

[20]  Christoforos Kachris,et al.  Configurable Transactional Memory , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[21]  David A. Wood,et al.  Performance Pathologies in Hardware Transactional Memory , 2007, IEEE Micro.