Customized Core Layout: A Case Study on Dual-Core Dynamic Binary Translation System

High efficient hardware accelerators are customized to improve the performance of applications from special domains. In this paper, we try to answer the question of where should we integrate a customized processor core to accelerate an application via comparing the performance on different multi-core platforms. We select dynamic binary translation as example application. First, we establish a performance model for software DBT system and describe its design space. Then, we build an abstract architecture of dual-core DBT system, and list three possible locations to integrate a customized accelerator. Finally, we did simulation on the three platforms with the customized DBT core integrated in the different locations, and discussed the result. By the result of the simulation, we have proved that the DBT can be speedup by the dual-core platform, and the platform 2 with a customized core on DIMM shows the best performance (about 48% compared with the single core DBT).

[1]  Li Shen,et al.  Dynamically utilizing computation accelerators for extensible processors in a software approach , 2009, CODES+ISSS '09.

[2]  Richard Johnson,et al.  The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[3]  Yun Wang,et al.  IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems , 2003, MICRO.

[4]  Christoforos E. Kozyrakis,et al.  Decoupling Dynamic Information Flow Tracking with a dedicated coprocessor , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[5]  Mary Lou Soffa,et al.  Overhead reduction techniques for software dynamic translation , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[6]  Zhiying Wang,et al.  An approach to minimizing the interpretation overhead in Dynamic Binary Translation , 2011, The Journal of Supercomputing.

[7]  K. Ebcioglu,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[8]  Yun Wang,et al.  IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium/spl reg/-based systems , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[9]  Wang,et al.  TransARM: An Efficient Instruction Set Architecture Emulator , 2011 .

[10]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[11]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[12]  Erik R. Altman,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[13]  Peng Li,et al.  Low power embedded speech recognition system based on a MCU and a coprocessor , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[15]  Richard Johnson,et al.  The Transmeta Code Morphing#8482; Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, CGO.

[16]  Sotirios G. Ziavras,et al.  Coprocessor design to support MPI primitives in configurable multiprocessors , 2007 .