Modeling and Performance Evaluation of TSO-Preserving Binary Optimization

Program optimization on multi-core systems must preserve the program memory consistency. This paper studies TSO-preserving binary optimization. We introduce a novel approach to formally model TSO-preserving binary optimization based on the formal TSO memory model. The major contribution of the modeling is a sound and complete algorithm to verify TSO-preserving binary optimization with O(N2) complexity. We also developed a dynamic binary optimization system to evaluate the performance impact of TSO-preserving optimization. We show in our experiments that, dynamic binary optimization without memory optimizations can improve performance by 8.1%. TSO-preserving optimizations can further improve the performance by 4.8% to a total 12.9%. Without considering the restriction for TSO-preserving optimizations, the dynamic binary optimization can improve the overall performance to 20.4%.

[1]  Peter Sewell,et al.  A Better x86 Memory Model: x86-TSO , 2009, TPHOLs.

[2]  Josep Torrellas,et al.  BulkCompiler: High-performance Sequential Consistency through cooperative compiler and hardware support , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Craig B. Zilles,et al.  Hardware atomicity for reliable software speculation , 2007, ISCA '07.

[4]  Sridhar Narayanan,et al.  TSOtool: a program for verifying memory systems using the memory consistency model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[5]  Josep Torrellas,et al.  BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.

[6]  David L Weaver,et al.  The SPARC architecture manual : version 9 , 1994 .

[7]  Bowen Alpern,et al.  Detecting equality of variables in programs , 1988, POPL '88.

[8]  Mikko H. Lipasti,et al.  An approach for implementing efficient superscalar CISC processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[9]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[10]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[11]  K. Ebcioglu,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[12]  Avi Mendelson,et al.  Power awareness through selective dynamically optimized traces , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[13]  Yun Wang,et al.  IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium/spl reg/-based systems , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[14]  Jonathan S. Shapiro,et al.  HDTrans: an open source, low-level dynamic instrumentation system , 2006, VEE '06.

[15]  Vivek Sarkar,et al.  May-happen-in-parallel analysis of X10 programs , 2007, PPoPP.

[16]  Wei Liu,et al.  TAO: two-level atomicity for dynamic binary optimizations , 2010, CGO '10.

[17]  Satish Narayanasamy,et al.  A case for an SC-preserving compiler , 2011, PLDI '11.

[18]  Sanjay J. Patel,et al.  rePLay: A Hardware Framework for Dynamic Optimization , 2001, IEEE Trans. Computers.

[19]  Erik R. Altman,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[20]  Sanjay J. Patel,et al.  Increasing the size of atomic instruction blocks using control flow assertions , 2000, MICRO 33.

[21]  Albert Meixner,et al.  Dynamic Verification of Memory Consistency in Cache-Coherent Multithreaded Computer Architectures , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[22]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[23]  Yun Wang,et al.  IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems , 2003, MICRO.

[24]  Cheng Wang,et al.  StarDBT: An Efficient Multi-platform Dynamic Binary Translation System , 2007, Asia-Pacific Computer Systems Architecture Conference.

[25]  Albert Meixner,et al.  Dynamic verification of sequential consistency , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[26]  Sebastian Burckhardt,et al.  Verifying Local Transformations on Relaxed Memory Models , 2010, CC.

[27]  Michael Gschwind,et al.  Precise Exception Semantics in Dynamic Compilation , 2002, CC.

[28]  Hans-Juergen Boehm,et al.  Foundations of the C++ concurrency memory model , 2008, PLDI '08.

[29]  Jeremy Manson,et al.  The Java memory model , 2005, POPL '05.

[30]  Arvind,et al.  Memory Model = Instruction Reordering + Store Atomicity , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).