A Static Binary Translator for Efficient Migration of ARM based Applications

Binary translation is often used in migrating legacy binaries to new architecture based platforms. This paper describes a static binary translator which translates ARM binaries to a MIPS-like architecture designed for embedded systems. The static translator handles basic architecture translations and performs optimizations to minimize instruction overhead. The conditional execution feature in the ARM architecture requires special attention on binary translation and optimization. With several optimizations to minimize condition updates and checks, the translated code from ARM to our target architecture increases the instruction path length by only 35% on the EEMBC benchmark.

[1]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[2]  Kai Sorensen,et al.  Federal Information Processing Standards Publication , 1985 .

[3]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[4]  Vivek Sarkar,et al.  On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.

[5]  Kaushik Roy,et al.  SYCLOP: synthesis of CMOS logic for low power applications , 1992, Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.

[6]  Heinrich Meyr,et al.  High level software synthesis for signal processing systems , 1992, [1992] Proceedings of the International Conference on Application Specific Array Processors.

[7]  Anantha P. Chandrakasan,et al.  Low-power CMOS digital design , 1992 .

[8]  William Pugh,et al.  Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.

[9]  M. Horowitz,et al.  Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[10]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[11]  Massoud Pedram,et al.  Register Allocation and Binding for Low Power , 1995, 32nd Design Automation Conference.

[12]  Saman Amarasinghe,et al.  The suif compiler for scalable parallel machines , 1995 .

[13]  Edward A. Lee,et al.  Dataflow process networks , 1995, Proc. IEEE.

[14]  Rizos Sakellariou,et al.  On the Quest for Perfect Load Balance in Loop-Based Parallel Computations , 1996 .

[15]  Martti Penttonen,et al.  A Reliable Randomized Algorithm for the Closest-Pair Problem , 1997, J. Algorithms.

[16]  Sharad Malik,et al.  Dynamic power management for microprocessors: a case study , 1997, Proceedings Tenth International Conference on VLSI Design.

[17]  Sharad Malik,et al.  Power analysis and minimization techniques for embedded DSP software , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[18]  Erik R. Altman,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[19]  Vivek Tiwari,et al.  Reducing power in high-performance microprocessors , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[20]  Vincent Loechner,et al.  Parametric Analysis of Polyhedral Iteration Spaces , 1998, J. VLSI Signal Process..

[21]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[22]  David Detlefs,et al.  Garbage collection and local variable type-precision and liveness in Java virtual machines , 1998, PLDI.

[23]  Charles E. McDowell,et al.  JavaCam: Trimming Java Down to Size , 1998, IEEE Internet Comput..

[24]  Joan Daemen,et al.  AES Proposal : Rijndael , 1998 .

[25]  John Yates,et al.  FX!32 a profile-directed binary translator , 1998, IEEE Micro.

[26]  Pierre Boulet,et al.  Communication Pre-evaluation in HPF , 1998, Euro-Par.

[27]  James M. Stichnoth,et al.  Support for garbage collection at every instruction in a Java compiler , 1999, PLDI '99.

[28]  A. Barvinok,et al.  An Algorithmic Theory of Lattice Points in Polyhedra , 1999 .

[29]  Roy Dz-Ching Ju,et al.  Probabilistic memory disambiguation and its application to data speculation , 1999, CARN.

[30]  Erik R. Altman,et al.  LaTTe: a Java VM just-in-time compiler with fast and efficient register allocation , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[31]  Urs Hölzle,et al.  A Study of the Allocation Behavior of the SPECjvm98 Java Benchmark , 1999, ECOOP.

[32]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[33]  Kouichi Itoh,et al.  Fast Implementation of Public-Key Cryptography ona DSP TMS320C6201 , 1999, CHES.

[34]  Vincent Loechner PolyLib: A Library for Manipulating Parameterized Polyhedra , 1999 .

[35]  Ricardo Dahab,et al.  High-Speed Software Multiplication in F2m , 2000, INDOCRYPT.

[36]  Jason Flinn,et al.  Quantifying the energy consumption of a pocket computer and a Java virtual machine , 2000, SIGMETRICS '00.

[37]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[38]  Mahmut T. Kandemir,et al.  Energy-driven integrated hardware-software optimizations using SimplePower , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[39]  Gurindar S. Sohi,et al.  A static power model for architects , 2000, MICRO 33.

[40]  Robert Szewczyk,et al.  System architecture directions for networked sensors , 2000, ASPLOS IX.

[41]  Ibrahim N. Hajj,et al.  Architectural and compiler techniques for energy reduction in high-performance microprocessors , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[42]  Felix Heine,et al.  Volume Driven Data Distribution for NUMA-Machines , 2000, Euro-Par.

[43]  James M. Stichnoth,et al.  Practicing JUDO: Java under dynamic optimizations , 2000, PLDI '00.

[44]  Ed F. Deprettere,et al.  Compaan: deriving process networks from Matlab for embedded signal processing architectures , 2000, CODES '00.

[45]  A.P. Chandrakasan,et al.  Dual-threshold voltage techniques for low-power digital circuits , 2000, IEEE Journal of Solid-State Circuits.

[46]  Shuvra S. Bhattacharyya,et al.  Parameterized dataflow modeling of DSP systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[47]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[48]  Alfred Menezes,et al.  Software Implementation of Elliptic Curve Cryptography over Binary Fields , 2000, CHES.

[49]  Mahmut T. Kandemir,et al.  Influence of compiler optimizations on system power , 2000, Proceedings 37th Design Automation Conference.

[50]  Ed F. Deprettere,et al.  Deriving Process Networks from Nested Loop Algorithms , 2000, Parallel Process. Lett..

[51]  Erik R. Altman,et al.  Efficient Java exception handling in just-in-time compilation , 2000, JAVA '00.

[52]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.

[53]  Alvin R. Lebeck,et al.  Power aware page allocation , 2000, SIGP.

[54]  Cindy Zheng,et al.  PA-RISC to IA-64: Transparent Execution, No Recompilation , 2000, Computer.

[55]  Min Wang,et al.  How Well Are High-End DSPs Suited for the AES Algorithms? AES Algorithms on the TMS320C6x DSP , 2000, AES Candidate Conference.

[56]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[57]  Alfred Menezes,et al.  The Elliptic Curve Digital Signature Algorithm (ECDSA) , 2001, International Journal of Information Security.

[58]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[59]  Mahmut T. Kandemir,et al.  DRAM energy management using software and hardware directed power mode control , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[60]  Siddhartha Chatterjee,et al.  Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.

[61]  Trevor N. Mudge,et al.  Power: A First-Class Architectural Design Constraint , 2001, Computer.

[62]  Mahmut T. Kandemir,et al.  Automatic data migration for reducing energy consumption in multi-bank memory systems , 2002, DAC '02.

[63]  Guang R. Gao,et al.  Exploiting Schedule Slacks for Rate-Optimal Power-Minimum Software Pipelining , 2002 .

[64]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[65]  Derek Bruening,et al.  Secure Execution via Program Shepherding , 2002, USENIX Security Symposium.

[66]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[67]  Santosh Pande,et al.  Optimizing Static Power Dissipation by Functional Units in Superscalar Processors , 2002, CC.

[68]  Philip Levis,et al.  Maté: a tiny virtual machine for sensor networks , 2002, ASPLOS X.

[69]  Amer Diwan,et al.  When to use a compilation service? , 2002, LCTES/SCOPES '02.

[70]  Ed F. Deprettere,et al.  Translating Imperative Affine Nested Loop Programs into Process Networks , 2002, Embedded Processor Design Challenges.

[71]  T. Oda Transistor elements for 30nm physical gate lengths and beyond , 2002 .

[72]  Managing static leakage energy in microprocessor functional units , 2002, MICRO 35.

[73]  A. Murat Fiskiran,et al.  Workload characterization of elliptic curve cryptography and other network security algorithms for constrained environments , 2002, 2002 IEEE International Workshop on Workload Characterization.

[74]  Jenq Kuen Lee,et al.  Compiler Analysis and Supports for Leakage Power Reduction on Microprocessors , 2002, LCPC.

[75]  Srivaths Ravi,et al.  System design methodologies for a wireless security processing platform , 2002, DAC '02.

[76]  Mahmut T. Kandemir,et al.  Tuning garbage collection in an embedded Java environment , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[77]  DELI: a new run-time control point , 2002, MICRO 35.

[78]  Vivek De,et al.  Sub-90 nm technologies-challenges and opportunities for CAD , 2002, IEEE/ACM International Conference on Computer Aided Design, 2002. ICCAD 2002..

[79]  Norman Ramsey,et al.  Experience in the design, implementation and use of a retargetable static binary translation framework , 2002 .

[80]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[81]  Tevi Devor,et al.  IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium®-based systems , 2003, MICRO.

[82]  Jenq Kuen Lee,et al.  Compiler optimization on VLIW instruction scheduling for low power , 2003, TODE.

[83]  Christof Paar,et al.  Cryptography in Embedded Systems : An Overview , 2003 .

[84]  Ed F. Deprettere,et al.  Laura: Leiden Architecture Research and Exploration Tool , 2003, FPL.

[85]  Jørgen Lindskov Knudsen,et al.  Compiling java for low-end embedded systems , 2003, LCTES '03.

[86]  Johann Großschädl,et al.  Instruction set extension for fast elliptic curve cryptography over binary finite fields GF(2/sup m/) , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[87]  Ed F. Deprettere,et al.  Deriving process networks from weakly dynamic applications in system-level design , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[88]  The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System , 2003, MICRO.

[89]  Ed F. Deprettere,et al.  Modeling Stream-Based Applications Using the SBF Model of Computation , 2001, J. VLSI Signal Process..

[90]  Wei Zhang,et al.  Compiler support for reducing leakage energy consumption , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[91]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[92]  Edward A. Lee,et al.  Hierarchical reconfiguration of dataflow models , 2004, Proceedings. Second ACM and IEEE International Conference on Formal Methods and Models for Co-Design, 2004. MEMOCODE '04..

[93]  David F. Bacon,et al.  Garbage collection for embedded systems , 2004, EMSOFT '04.

[94]  Alexandru Turjan,et al.  System design using Khan process networks: the Compaan/Laura approach , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[95]  Alexandru Turjan,et al.  Translating affine nested-loop programs to process networks , 2004, CASES '04.

[96]  Vincent Loechner,et al.  Parameterized Polyhedra and Their Vertices , 1997, International Journal of Parallel Programming.

[97]  Derek Bruening,et al.  Efficient, transparent, and comprehensive runtime code manipulation , 2004 .

[98]  David Blaauw,et al.  Circuit and microarchitectural techniques for reducing cache leakage power , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[99]  Pradip Bose,et al.  Microarchitectural techniques for power gating of execution units , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[100]  R. Vitulli PRDC: an ASIC device for lossless data compression implementing the Rice algorithm , 2004, IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium.

[101]  Vincent Loechner,et al.  Communication Optimization for Affine Recurrence Equations Using Broadcast and Locality , 2004, International Journal of Parallel Programming.

[102]  Vincent Loechner,et al.  Precise Data Locality Optimization of Nested Loops , 2004, The Journal of Supercomputing.

[103]  Benoît Meister Stating and manipulating periodicity in the polytope model : Applications to program analysis and optimization , 2004 .

[104]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[105]  Twan Basten,et al.  Reactive process networks , 2004, EMSOFT '04.

[106]  Christof Paar,et al.  An instruction-level distributed processor for symmetric-key cryptography , 2005, IEEE Transactions on Parallel and Distributed Systems.

[107]  Mary Lou Soffa,et al.  Planning for code buffer management in distributed virtual execution environments , 2005, VEE '05.

[108]  Sven Verdoolaege Incremental Loop Transformations and Enumeration of Parametric Sets (Incrementele lustransformaties en enumeratie van parametrische verzamelingen) , 2005 .

[109]  P. Hebden,et al.  Bloom filters for data aggregation and discovery: a hierarchical clustering approach , 2005, 2005 International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[110]  Roberto Maria Avanzi,et al.  Energy-Efficient Software Implementation of Long Integer Modular Arithmetic , 2005, CHES.

[111]  Kristof Beyls,et al.  Generating cache hints for improved program efficiency , 2005, J. Syst. Archit..

[112]  Shuvra S. Bhattacharyya,et al.  Modeling of Block-Based DSP Systems , 2005, J. VLSI Signal Process..

[113]  Tyrrell B. McAllister,et al.  The minimum period of the Ehrhart quasi-polynomial of a rational polytope , 2005, J. Comb. Theory, Ser. A.

[114]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[115]  Jenq Kuen Lee,et al.  A sink-n-hoist framework for leakage power reduction , 2005, EMSOFT.

[116]  Amir Roth,et al.  Store vulnerability window (SVW): re-execution filtering for enhanced load optimization , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[117]  Matthew Arnold,et al.  A Survey of Adaptive Optimization in Virtual Machines , 2005, Proceedings of the IEEE.

[118]  John W. Lockwood,et al.  Fast and scalable pattern matching for content filtering , 2005, 2005 Symposium on Architectures for Networking and Communications Systems (ANCS).

[119]  Nadia Tawbi,et al.  Armed E-Bunny: a selective dynamic compiler for embedded Java virtual machine targeting ARM processors , 2005, SAC '05.

[120]  Maurice Bruynooghe,et al.  Experiences with Enumeration of Integer Projections of Parametric Polytopes , 2005, CC.

[121]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[122]  James E. Smith,et al.  Virtual machines - versatile platforms for systems and processes , 2005 .

[123]  Alex Biryukov,et al.  Data Encryption Standard (DES) , 2005, Encyclopedia of Cryptography and Security.

[124]  Derek Bruening,et al.  Maintaining consistency and bounding capacity of software code caches , 2005, International Symposium on Code Generation and Optimization.

[125]  Gu-Yeon Wei,et al.  An ultra low power system architecture for sensor network applications , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[126]  Kyle A. Gallivan,et al.  Parametric timing estimation with Newton–Gregory formulae , 2006, Concurr. Comput. Pract. Exp..

[127]  Michael D. Smith,et al.  Managing bounded code caches in dynamic binary optimization systems , 2006, TACO.

[128]  Jenq Kuen Lee,et al.  Compilers for leakage power reduction , 2006, TODE.

[129]  Anantha Chandrakasan,et al.  Sub-threshold Design for Ultra Low-Power Systems , 2006, Series on Integrated Circuits and Systems.

[130]  Todor Stefanov,et al.  Improved derivation of process networks , 2006 .

[131]  Jack W. Davidson,et al.  Evaluating fragment construction policies for SDT systems , 2006, VEE '06.

[132]  Vincent Loechner,et al.  Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions , 2007, Algorithmica.

[133]  Albert Cohen,et al.  GRAPHITE: Loop Optimizations Based on the Polyhedral Model for GCC , 2006 .

[134]  A. Rahimi,et al.  Simultaneous localization, calibration, and tracking in an ad hoc sensor network , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[135]  David M. Brooks,et al.  Efficient architectures through application clustering and architectural heterogeneity , 2006, CASES '06.

[136]  Jack W. Davidson,et al.  Secure and practical defense against code-injection attacks using software dynamic translation , 2006, VEE '06.

[137]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[138]  Michael D. Smith,et al.  Persistent Code Caching: Exploiting Code Reuse Across Executions and Applications , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[139]  Johann Großschädl,et al.  Energy Evaluation of Software Implementations of Block Ciphers under Memory Constraints , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[140]  Apala Guha,et al.  Reducing Exit Stub Memory Consumption in Code Caches , 2007, HiPEAC.

[141]  Maurice Bruynooghe,et al.  Algorithms for Weighted Counting over Parametric Polytopes: A Survey and a Practical Comparison , 2008, ITSL.

[142]  Jean-Marc Chaduc,et al.  The International Telecommunication Union (ITU) , 2008 .

[143]  Harald Devos,et al.  Loop Transformations for the Optimized Generation of Reconfigurable Hardware , 2008 .

[144]  Philippe Clauss,et al.  Symbolic Polynomial Maximization Over Convex Sets and Its Application to Memory Requirement Estimation , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[145]  G. Blelloch Introduction to Data Compression * , 2022 .