Execution migration in a heterogeneous-ISA chip multiprocessor

Prior research has shown that single-ISA heterogeneous chip multiprocessors have the potential for greater performance and energy efficiency than homogeneous CMPs. However, restricting the cores to a single ISA removes an important opportunity for greater heterogeneity. To take full advantage of a heterogeneous-ISA CMP, however, we must be able to migrate execution among heterogeneous cores in order to adapt to program phase changes and changing external conditions (e.g., system power state). This paper explores migration on heterogeneous-ISA CMPs. This is non-trivial because program state is kept in an architecture-specific form; therefore, state transformation is necessary for migration. To keep migration cost low, the amount of state that requires transformation must be minimized. This work identifies large portions of program state whose form is not critical for performance; the compiler is modified to produce programs that keep most of their state in an architecture-neutral form so that only a small number of data items must be repositioned and no pointers need to be changed. The result is low migration cost with minimal sacrifice of non-migration performance. Additionally, this work leverages binary translation to enable instantaneous migration. When migration is requested, the program is immediately migrated to a different core where binary translation runs for a short time until a function call is reached, at which point program state is transformed and execution continues natively on the new core. This system can tolerate migrations as often as every 100 ms and still retain 95% of the performance of a system that does not do, or support, migration.

[1]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[2]  Peter Smith,et al.  Heterogeneous process migration: the Tui system , 1998, Softw. Pract. Exp..

[3]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[4]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[5]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[6]  Daniel Marques,et al.  Automated application-level checkpointing of MPI programs , 2003, PPoPP '03.

[7]  Volker Strumpen Compiler Technology for Portable Checkpoints , 2010 .

[8]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[9]  Rida A. Bazzi,et al.  Compiler-assisted heterogeneous checkpointing , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[10]  Santanu Dutta,et al.  Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems , 2001, IEEE Des. Test Comput..

[11]  Charles M. Shub Native code process-originated migration in a heterogeneous environment , 1990, CSC '90.

[12]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[13]  Michael Philippsen,et al.  Near Overhead-free Heterogeneous Thread-migration , 2005, 2005 IEEE International Conference on Cluster Computing.

[14]  Volker Strumpen,et al.  Portable Checkpointing and Recovery in Heterogeneous Environments , 1996 .

[15]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[16]  James E. Smith,et al.  Virtual machines - versatile platforms for systems and processes , 2005 .

[17]  Wuu Yang,et al.  A Static Binary Translator for Efficient Migration of ARM based Applications , 2008 .

[18]  Norman C. Hutchinson,et al.  Heterogeneous process migration: the Tui system , 1998 .

[19]  B. Ramkumar,et al.  Portable checkpointing for heterogeneous architectures , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[20]  Andrew S. Grimshaw,et al.  Heterogeneous process state capture and recovery through Process Introspection , 2000, Cluster Computing.

[21]  Bjarne Steensgaard,et al.  Object and native code thread mobility among heterogeneous computers , 1995, SOSP.

[22]  Cindy Zheng,et al.  PA-RISC to IA-64: Transparent Execution, No Recompilation , 2000, Computer.

[23]  Charles M. Shub,et al.  A unified model of pointwise equivalence of procedural computations , 1994, TOPL.

[24]  L. Peter Deutsch,et al.  Efficient implementation of the smalltalk-80 system , 1984, POPL.

[25]  Charles M. Shub,et al.  Process-originated migration in a heterogeneous environment , 1989, CSC '89.

[26]  Keshav Pingali,et al.  Mobile MPI programs in computational grids , 2006, PPoPP '06.

[27]  R. Hookway DIGITAL FX!32 running 32-Bit x86 applications on Alpha NT , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.