论文信息 - Bahurupi: A polymorphic heterogeneous multi-core architecture

Bahurupi: A polymorphic heterogeneous multi-core architecture

Computing systems have made an irreversible transition towards parallel architectures with the emergence of multi-cores. Moreover, power and thermal limits in embedded systems mandate the deployment of many simpler cores rather than a few complex cores on chip. Consumer electronic devices, on the other hand, need to support an ever-changing set of diverse applications with varying performance demands. While some applications can benefit from thread-level parallelism offered by multi-core solutions, there still exist a large number of applications with substantial amount of sequential code. The sequential programs suffer from limited exploitation of instruction-level parallelism in simple cores. We propose a reconfigurable multi-core architecture, called Bahurupi, that can successfully reconcile the conflicting demands of instruction-level and thread-level parallelism. Bahurupi can accelerate the performance of serial code by dynamically forming coalition of two or more simple cores to offer increased instruction-level parallelism. In particular, Bahurupi can efficiently merge 2-4 simple 2-way out-of-order cores to reach or even surpass the performance of more complex and power-hungry 4-way or 8-way out-of-order core. Compared to baseline 2-way core, quad-core Bahurupi achieves up to 5.61 speedup (average 4.08 speedup) for embedded workloads. On an average, quad-core Bahurupi achieves 17% performance improvement and 43% improvement in energy consumption compared to 8-way out-of-order baseline core on a diverse set of embedded benchmark applications.

Tulika Mitra | Mihai Pricopi

[1] Shunfei Chen,et al. MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[2] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[3] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[4] Craig B. Zilles,et al. Fundamental performance constraints in horizontal fusion of in-order cores , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[5] S. Winkel. Optimal versus Heuristic Global Code Scheduling , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[6] Norman P. Jouppi,et al. Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[7] Engin Ipek,et al. Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.

[8] Srinivas Devadas,et al. Dynamic Cache Partitioning via Columnization , 2000, DAC 2000.

[9] Ravi Rajwar,et al. The impact of performance asymmetry in emerging multicore architectures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[10] Onur Mutlu,et al. Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[11] W. Campbell,et al. THE UNIVERSITY OF TEXAS AT DALLAS , 2004 .

[12] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.

[13] Kevin Skadron,et al. Federation: Repurposing scalar cores for out-of-order instruction issue , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[14] Norman P. Jouppi,et al. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[15] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16] Rainer Leupers,et al. Multiprocessor Systems on Chip , 2011 .

[17] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008, Computer.

[18] Scott A. Mahlke,et al. Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[19] Eduard Ayguadé,et al. Task Superscalar: An Out-of-Order Task Pipeline , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[20] Eric Rotenberg,et al. FabScalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[21] Todd M. Austin,et al. SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[22] Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, Salt Lake City, UT, USA, March 1-5, 2014 , 2014, ASPLOS.

[23] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[24] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .

[25] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008 .

[26] Wayne H. Wolf,et al. Multiprocessor Systems-on-Chips , 2004, ISVLSI.

[27] Norman P. Jouppi,et al. Conjoined-Core Chip Multiprocessing , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).