Reliability-aware core partitioning in chip multiprocessors

Executing multiple applications concurrently is an important way of utilizing the computational power provided by emerging chip multiprocessor (CMP) architectures. However, this multiprogramming brings a resource management and partitioning problem, for which one can find numerous examples in the literature. Most of the resource partitioning schemes proposed to date focus on performance or energy centric strategies. In contrast, this paper explores reliability-aware core partitioning strategies targeting CMPs. One of our schemes considers both performance and reliability objectives by maximizing a novel combined metric called the vulnerability-delay product (VDP). The vulnerability component in this metric is represented with Thread Vulnerability Factor (TVF), a recently proposed metric for quantifying thread vulnerability for multicores. Execution time of the given application represents the delay component of the VDP metric. As part of our experimental analysis, proposed core partitioning schemes are compared with respect to normalized weighted speedup, normalized weighted reliability loss and normalized weighted vulnerability delay product gain metrics for various workloads of benchmark applications.

[1]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[2]  Steven K. Reinhardt,et al.  The impact of resource partitioning on SMT processors , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[3]  David R. Kaeli,et al.  Eliminating microarchitectural dependency from Architectural Vulnerability , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[4]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[5]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[6]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[7]  Engin Ipek,et al.  Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[8]  Lothar Thiele,et al.  Platform synthesis and partitioning of real-time tasks for energy efficiency , 2011, J. Syst. Archit..

[9]  Tianzhou Chen,et al.  A Redundancy Mechanism under Single Chip Multiprocessor Architecture , 2008, 2008 Fifth IEEE International Symposium on Embedded Computing.

[10]  Mahmut T. Kandemir,et al.  Quantifying Thread Vulnerability for Multicore Architectures , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[11]  Francisco J. Cazorla,et al.  Power and performance aware reconfigurable cache for CMPs , 2010, IFMT '10.

[12]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[13]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[14]  Kai Li,et al.  PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors , 2008, 2008 IEEE International Symposium on Workload Characterization.

[15]  George Bosilca,et al.  Algorithm-based fault tolerance applied to high performance computing , 2009, J. Parallel Distributed Comput..

[16]  Tei-Wei Kuo,et al.  Energy-efficient real-time scheduling of multimedia tasks on multi-core processors , 2010, SAC '10.

[17]  Sandhya Dwarkadas,et al.  Partitioning Multi-Threaded Processors with a Large Number of Threads , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[18]  Coniferous softwood GENERAL TERMS , 2003 .

[19]  Omer Khan,et al.  Improving yield and reliability of chip multiprocessors , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[20]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[21]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[22]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[23]  Mahmut T. Kandemir,et al.  A case for integrated processor-cache partitioning in chip multiprocessors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[24]  Anand Sivasubramaniam,et al.  Characterizing the soft error vulnerability of multicores running multithreaded applications , 2010, SIGMETRICS '10.

[25]  Hong Liu,et al.  An efficient processor partitioning and thread mapping strategy for mesh-connected multiprocessor systems , 1997, SAC '97.

[26]  Yan Solihin,et al.  A Framework for Providing Quality of Service in Chip Multi-Processors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[27]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[28]  Jun Ni,et al.  Fault-Tolerance CMP Architecture based on SMT Technology , 2007 .

[29]  Wenbin Yao,et al.  Fault-Tolerance CMP Architecture based on SMT Technology , 2007, Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007).

[30]  Koushik Chakraborty,et al.  Mixed-mode multicore reliability , 2009, ASPLOS.

[31]  Jichuan Chang,et al.  Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.

[32]  A. Snavely,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[33]  Mahmut T. Kandemir,et al.  Dynamic core partitioning for energy efficiency , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).