Online testing of many-core systems in the Dark Silicon era

As the dark silicon era is about to embrace, it is not anymore possible to attain commensurate performance benefits by increasing the number of transistors due to thermal design power. Dark Silicon issue stresses that a fraction of silicon chip being able to switch in full frequency is dropping and designers will soon face the growing underutilization inherent in future technologies. On the other hand, by reducing the transistor size, susceptibility to internal defects drastically increases and large ranges of defects such as aging or transient faults will be shown up more frequently. In this paper, we propose an online test scheduling algorithm using software based self-test for dark silicon era to test dark cores while considering thermal design power of the system. As the dark area of the system is dynamic and reshapes at a runtime, the tested cores can be used by other applications in the near future. Empirical results show the effectiveness of the proposed algorithm in terms of applicability and fault coverage with a negligible negative impact on the system throughput.

[1]  Shekhar Y. Borkar,et al.  Microarchitecture and Design Challenges for Gigascale Integration , 2004, MICRO.

[2]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[3]  Kevin Skadron,et al.  Implications of the Power Wall: Dim Cores and Reconfigurable Logic , 2013, IEEE Micro.

[4]  Zainalabedin Navabi,et al.  Effective RT-level software-based self-testing of embedded processor cores , 2012, 2012 IEEE 15th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[5]  Paul Ampadu,et al.  Self-Adaptive System for Addressing Permanent Errors in On-Chip Interconnects , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Babak Falsafi,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[7]  Luca Benini,et al.  ReliNoC: A reliable network for priority-based on-chip communication , 2011, 2011 Design, Automation & Test in Europe.

[8]  Kevin Skadron,et al.  Dark vs. Dim Silicon and Near-Threshold Computing , 2013 .

[9]  Petru Eles,et al.  Thermal-Aware SoC Test Scheduling with Test Set Partitioning and Interleaving , 2008, J. Electron. Test..

[10]  Yervant Zorian,et al.  Embedded Processor-Based Self-Test , 2004 .

[11]  Luca Benini,et al.  A distributed and topology-agnostic approach for on-line NoC testing , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.

[12]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[13]  K. Steinhubl Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .

[14]  Sujit Dey,et al.  Software-based self-testing methodology for processor cores , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[15]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[16]  Armin Alaghi,et al.  Online NoC Switch Fault Detection and Diagnosis Using a High Level Fault Model , 2007, 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2007).

[17]  Vikram Bhatt,et al.  The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future , 2011, IEEE Micro.

[18]  M. H. Haghbayan,et al.  Online Test Macro Scheduling and Assignment in MPSoC Design , 2011, 2011 Asian Test Symposium.

[19]  David Blaauw,et al.  A highly resilient routing algorithm for fault-tolerant NoCs , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[20]  Larry J. Stockmeyer,et al.  A new approach to fault-tolerant wormhole routing for mesh-connected parallel computers , 2004, Proceedings 16th International Parallel and Distributed Processing Symposium.

[21]  裕幸 飯田,et al.  International Technology Roadmap for Semiconductors 2003の要求清浄度について - シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について - , 2004 .

[22]  William Lindsay,et al.  FRITS - a microprocessor functional BIST method , 2002, Proceedings. International Test Conference.

[23]  Nilanjan Mukherjee,et al.  Optimal core wrapper width selection and SOC test scheduling based on 3-D bin packing algorithm , 2002, Proceedings. International Test Conference.

[24]  Architectures Book,et al.  Digital System Test And Testable Design Using Hdl Models And Architectures , 2016 .

[25]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[26]  Larry J. Stockmeyer,et al.  A new approach to fault-tolerant wormhole routing for mesh-connected parallel computers , 2002, IEEE Transactions on Computers.

[27]  M. H. Haghbayan,et al.  Test Pattern Selection and Compaction for Sequential Circuits in an HDL Environment , 2010, 2010 19th IEEE Asian Test Symposium.

[28]  Henry Hoffmann,et al.  On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.

[29]  Michail Maniatakos,et al.  Systematic Software-Based Self-Test for Pipelined Processors , 2008, IEEE Trans. Very Large Scale Integr. Syst..

[30]  Karthikeyan Sankaralingam,et al.  Power challenges may end the multicore era , 2013, CACM.

[31]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[32]  Vanchinathan Venkataramani,et al.  Hierarchical power management for asymmetric multi-core in dark silicon era , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[33]  Dimitris Gizopoulos,et al.  Software-based self-testing of embedded processors , 2005, IEEE Transactions on Computers.

[34]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[35]  M. H. Haghbayan,et al.  Power constraint testing for multi-clock domain SoCs using concurrent hybrid BIST , 2012, 2012 IEEE 15th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[36]  Federico Silla,et al.  Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[37]  Pasi Liljeberg,et al.  Smart hill climbing for agile dynamic mapping in many-core systems , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[38]  Antonio Robles,et al.  An Efficient Fault-Tolerant Routing Methodology for Meshes and Tori , 2004, IEEE Computer Architecture Letters.

[39]  Chung-Ho Chen,et al.  Effective Hybrid Test Program Development for Software-Based Self-Testing of Pipeline Processor Cores , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.