Towards heterogenous 3D-stacked reliable computing with von Neumann multiplexing

The reliability of near-future nano-meter range CMOS, and novel nano-computing devices is greatly affected by undesired effects of physical phenomena appearing due to continuous technology scaling. The emerging 3D-Stacking Integrated Circuits (3D-SIC) technology allows devices manufactured using different technologies, and thus with different reliability, to be stacked on top of each other and connected with low latency links. In this paper, we propose to take advantage of this new design space dimension, i.e., the individual reliability of devices, when using the von Neumann multiplexing redundancy technique. Our analysis suggests that multiplexing units reliability importance is determined by how high the error rate of individual gates in the system is, i.e., for high error rates the units at the end of the restoration chain are critical, while for low error rates the units at the beginning of the restoration chain are critical. We further introduce and evaluate the first, to the best of our knowledge, heterogeneous 3D-SIC multiplexing arrangements. Our results indicate that assuming that delay and area are doubled for a technology with an order of magnitude higher reliability, a heterogeneous multiplexing scheme with gates having high and medium error rates can achieve a reduction of 1.79× in delay and area, with a 9% loss in the Reliability Improvement Index (RII), over the homogeneous counterpart with only medium reliability gates. For medium and low error rates, a minimum 1% RII loss can be traded for a delay and footprint reduction of 5.66× and 4.25×, respectively.

[1]  Theresa F. Klaschka A Method for Redundancy Scheme Performance Assessment , 1971, IEEE Transactions on Computers.

[2]  George R. Roelke,et al.  Analytical Models for the Performance of von Neumann Multiplexing , 2007, IEEE Transactions on Nanotechnology.

[3]  D. Bhaduri,et al.  Comparing Reliability-Redundancy Tradeoffs for Two von Neumann Multiplexing Architectures , 2007, IEEE Transactions on Nanotechnology.

[4]  Jie Han,et al.  A system architecture solution for unreliable nanoelectronic devices , 2002 .

[5]  A. S. Sadek,et al.  Fault-tolerant techniques for nanocomputers , 2002 .

[6]  William H. Pierce Failure-Tolerant Computer Design , 2014 .

[7]  Valeriu Beiu,et al.  Gate Failures Effectively Shape Multiplexing , 2006, 2006 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[8]  Leonard J. Schulman,et al.  On the maximum tolerable noise of k-input gates for reliable computation by formulas , 2003, IEEE Trans. Inf. Theory.

[9]  Marta Z. Kwiatkowska,et al.  PRISM 4.0: Verification of Probabilistic Real-Time Systems , 2011, CAV.

[10]  F. Martorell,et al.  Fault tolerant structures for nanoscale gates , 2007, 2007 7th IEEE Conference on Nanotechnology (IEEE NANO).

[11]  Sorin Dan Cotofana,et al.  Is 3D integration the way to future dependable computing platforms? , 2012, 2012 13th International Conference on Optimization of Electrical and Electronic Equipment (OPTIM).

[12]  V. Beiu,et al.  Devices and Input Vectors are Shaping von Neumann Multiplexing , 2011, IEEE Transactions on Nanotechnology.

[13]  John P. Hayes,et al.  Accurate reliability evaluation and enhancement via probabilistic transfer matrices , 2005, Design, Automation and Test in Europe.

[14]  S. Roy,et al.  Majority multiplexing-economical redundant fault-tolerant designs for nanoarchitectures , 2005, IEEE Transactions on Nanotechnology.

[15]  Robert E. Lyons,et al.  The Use of Triple-Modular Redundancy to Improve Computer Reliability , 1962, IBM J. Res. Dev..

[16]  A. S. Sadek,et al.  Parallel information and computation with restitution for noise-tolerant nanoscale logic networks , 2003 .

[17]  Sandeep K. Shukla,et al.  Evaluating the reliability of NAND multiplexing with PRISM , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.