Memory System Design for Ultra Low Power, Computationally Error Resilient Processor Microarchitectures

Dennard scaling ended a decade ago. Energy reduction by lowering supply voltage has been limited because of guard bands and a subthreshold slope of over 60mV/decade in MOSFETs. On the other hand, newly-proposed logic devices maintain a high on/off ratio for drain currents even at significantly lower operating voltages. However, such ultra low power technology would eventually suffer from intermittent errors in logic as a result of operating close to the thermal noise floor. Computational error correction mitigates this issue by efficiently correcting stochastic bit errors that may occur in computational logic operating at low signal energies, thereby allowing for energy reduction by lowering supply voltage to tens of millivolts. Cores based on a Redundant Residual Number System (RRNS), which represents a number using a tuple of smaller numbers, are a promising candidate for implementing energyefficient computational error correction. However, prior RRNS core microarchitectures abstract away the memory hierarchy and do not consider the power-performance impact of RNS-based memory addressing. When compared with a non-error-correcting core addressing memory in binary, naive RNS-based memory addressing schemes cause a slowdown of over 3x/2x for inorder/out-of-order cores respectively. In this paper, we analyze RNS-based memory access pattern behavior and provide solutions in the form of novel schemes and the resulting design space exploration, thereby, extending and enabling a tangible, ultra low power RRNS based architecture.

[1]  R. W. Keyes,et al.  Miniaturization of electronics and its limits , 1988, IBM J. Res. Dev..

[2]  Y. Patt,et al.  Single instruction stream parallelism is greater than two , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[3]  Silvia M. Müller,et al.  The IBM zEnterprise-196 Decimal Floating-Point Accelerator , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[4]  C. Hu,et al.  Ferroelectric negative capacitance MOSFET: Capacitance tuning & antiferroelectric operation , 2011, 2011 International Electron Devices Meeting.

[5]  Minxuan Zhang,et al.  Cost effective soft error mitigation for parallel adders by exploiting inherent redundancy , 2010, 2010 IEEE International Conference on Integrated Circuit Design and Technology.

[6]  David T. Brown Error Detecting and Correcting Binary Codes for Arithmetic Operations , 1960, IRE Trans. Electron. Comput..

[7]  Ahmad A. Hiasat,et al.  On the Theory of Error Control Based on Moduli with Common Factors , 2001, Reliab. Comput..

[8]  Thomas M. Conte,et al.  Computationally-redundant energy-efficient processing for y'all (CREEPY) , 2016, 2016 IEEE International Conference on Rebooting Computing (ICRC).

[9]  Hao-Yung Lo,et al.  An Algorithm for Scaling and Single Residue Error Correction in Residue Number Systems , 1990, IEEE Trans. Computers.

[10]  Mi Lu,et al.  Floating-point numbers in residue number systems , 1991 .

[11]  Christof Fetzer,et al.  AN-Encoding Compiler: Building Safety-Critical Systems with Commodity Hardware , 2009, SAFECOMP.

[12]  Lizy Kurian John,et al.  Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[14]  Aviral Shrivastava,et al.  Exploiting residue number system for power-efficient digital signal processing in embedded processors , 2009, CASES '09.

[15]  Antonio González,et al.  Limits of Instruction Level Parallelism with Data Speculation , 1997 .

[16]  R. M. Swanson,et al.  Ion-implanted complementary MOS transistors in low-voltage circuits , 1972 .

[17]  H. Krishna,et al.  A coding theory approach to error control in redundant residue number systems. II. Multiple error detection and correction , 1992 .

[18]  C. Fetzer,et al.  Hardware Failure Virtualization Via Software Encoded Processing , 2007, 2007 5th IEEE International Conference on Industrial Informatics.

[19]  P. Forin,et al.  VITAL CODED MICROPROCESSOR PRINCIPLES AND APPLICATION FOR VARIOUS TRANSIT SYSTEMS , 1990 .

[20]  Rajit Manohar,et al.  Fault tolerant asynchronous adder through dynamic self-reconfiguration , 2005, 2005 International Conference on Computer Design.

[21]  Christophe Jégo,et al.  A new single-error correction scheme based on self-diagnosis residue number arithmetic , 2010, 2010 Conference on Design and Architectures for Signal and Image Processing (DASIP).

[22]  Behrooz Parhami,et al.  Fast RNS Division Algorithms for Fixed Divisors with Application to RSA Encrytion , 1994, Inf. Process. Lett..

[23]  Christof Fetzer,et al.  ANB- and ANBDmem-Encoding: Detecting Hardware Errors in Software , 2010, SAFECOMP.

[24]  Vincent J. Kruskal,et al.  LRU Stack Processing , 1975, IBM J. Res. Dev..

[25]  Lei Li,et al.  A new algorithm for single error correction In RRNS , 2013, 2013 International Conference on Communications, Circuits and Systems (ICCCAS).

[26]  Chao-Kai Liu,et al.  Error-Correcting-Codes in Computer Arithmetic , 1972 .

[27]  Onur Mutlu,et al.  Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.

[28]  Stafford E. Tavares,et al.  New Fault Tolerant Techniques for Residue Number Systems , 1992, IEEE Trans. Computers.

[29]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[30]  고워 케빈,et al.  A high reliability memory module with a fault tolerant address and command bus , 2004 .

[31]  Zhao Zhang,et al.  A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.

[32]  Balasubramaniam Natarajan,et al.  Performance of Systematic RRNS Based Space-Time Block Codes with Probability-Aware Adaptive Demapping , 2013, IEEE Transactions on Wireless Communications.

[33]  Said Hamdioui,et al.  Redundant Residue Number System Code for Fault-Tolerant Hybrid Memories , 2011, JETC.

[34]  E. Mizan,et al.  Self-Imposed Temporal Redundancy: An Efficient Technique to Enhance the Reliability of Pipelined Functional Units , 2007 .

[35]  Xiang-Gen Xia,et al.  Error Correction in Polynomial Remainder Codes With Non-Pairwise Coprime Moduli and Robust Chinese Remainder Theorem for Polynomials , 2014, IEEE Transactions on Communications.

[36]  Shlomi Dolev,et al.  Preserving Hamming Distance in Arithmetic and Logical Operations , 2013, J. Electron. Test..

[37]  Hoda S. Abdel-Aty-Zohdy,et al.  Semi-Custom VLSI Design and Implementation of a New Efficient RNS Division Algorithm , 1999, Comput. J..

[38]  Mi Lu,et al.  A Novel Division Algorithm for the Residue Number System , 1992, IEEE Trans. Computers.

[39]  T. R. N. Rao,et al.  Biresidue Error-Correcting Codes for Computer Arithmetic , 1970, IEEE Transactions on Computers.

[40]  H. Krishna,et al.  A coding theory approach to error control in redundant residue number systems. I. Theory and single error correction , 1992 .

[41]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[42]  J. Neumann Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[43]  Laurent Imbert,et al.  a full RNS implementation of RSA , 2004, IEEE Transactions on Computers.

[44]  Richard I. Tanaka,et al.  Residue arithmetic and its applications to computer technology , 1967 .

[45]  K. Y. Lin,et al.  Computational Number Theory and Digital Signal Processing: Fast Algorithms and Error Control Techniques , 1994 .

[46]  Sayeef Salahuddin,et al.  CMOS and Beyond: Extending CMOS with negative capacitance , 2015 .

[47]  Chip-Hong Chang,et al.  A Residue-to-Binary Converter for a New Five-Moduli Set , 2007, IEEE Transactions on Circuits and Systems I: Regular Papers.

[48]  Parag K. Lala,et al.  Self-Checking Carry-Select Adder Design Based on Two-Rail Encoding , 2007, IEEE Transactions on Circuits and Systems I: Regular Papers.

[49]  Gian Carlo Cardarilli,et al.  RNS-to-binary conversion for efficient VLSI implementation , 1998 .

[50]  Wenjing Rao,et al.  Towards fault tolerant parallel prefix adders in nanoelectronic systems , 2008, 2008 Design, Automation and Test in Europe.

[51]  Erich Kaltofen,et al.  Integer Division in Residue Number Systems , 1995, IEEE Trans. Computers.

[52]  D. V. Smirnov,et al.  A method of monitoring execution of arithmetic operations on computers in computerized monitoring and measuring systems , 2008 .

[53]  Jianhao Hu,et al.  New Error Control Algorithms for Residue Number System Codes , 2016 .

[54]  Thomas M. Conte,et al.  A Brief Survey of Non-Residue Based Computational Error Correction , 2016, ArXiv.

[55]  Piero Maestrini,et al.  Error Detection and Correction by Product Codes in Residue Number Systems , 1974, IEEE Transactions on Computers.

[56]  Kailash Gopalakrishnan,et al.  Overview of candidate device technologies for storage-class memory , 2008, IBM J. Res. Dev..

[57]  Ramesh Karri,et al.  Fault Identification in Reconfigurable Carry Lookahead Adders Targeting Nanoelectronic Fabrics , 2006, Eleventh IEEE European Test Symposium (ETS'06).

[58]  Paul M. Solomon,et al.  In Quest of the “Next Switch”: Prospects for Greatly Reduced Power Dissipation in a Successor to the Silicon Field-Effect Transistor , 2010, Proceedings of the IEEE.

[59]  Seungjoo Kim,et al.  RSA Speedup with Chinese Remainder Theorem Immune against Hardware Fault Cryptanalysis , 2003, IEEE Trans. Computers.

[60]  Chip-Hong Chang,et al.  A non-iterative multiple residue digit error detection and correction algorithm in RRNS , 2016, IEEE Transactions on Computers.

[61]  Vijaya Ramachandran Single Residue Error Correction in Residue Number Systems , 1983, IEEE Trans. Computers.

[62]  Janak H. Patel,et al.  Concurrent Error Detection in ALU's by Recomputing with Shifted Operands , 1982, IEEE Transactions on Computers.

[63]  Parag K. Lala,et al.  A technique for modular design of self-checking carry-select adder , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[64]  Francesco Piazza,et al.  A Systolic Redundant Residue Arithmetic Error Correction Circuit , 1993, IEEE Trans. Computers.

[65]  Michael Nicolaidis,et al.  Efficient implementations of self-checking multiply and divide arrays , 1994, Proceedings of European Design and Test Conference EDAC-ETC-EUROASIC.

[66]  Harvey L. Garner,et al.  RESIDUE NUMBER SYSTEM ENHANCEMENTS FOR PROGRAMMABLE PROCESSORS , 2008 .

[67]  R. W. Watson,et al.  Self-checked computation using residue arithmetic , 1966 .

[68]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[69]  Mojtaba Valinataj,et al.  Fault Tolerant Arithmetic Operations with Multiple Error Detection and Correction , 2007, 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2007).

[70]  Youfeng Wu,et al.  Quantifying instruction-level parallelism limits on an EPIC architecture , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).

[71]  John von Neumann,et al.  Theory Of Self Reproducing Automata , 1967 .

[72]  Pepe Siy,et al.  Arithmetic division in RNS using Galois Field GF(p) , 2000 .

[73]  Francesco Piazza,et al.  Fast Combinatorial RNS Processors for DSP Applications , 1995, IEEE Trans. Computers.

[74]  Michael Nicolaidis,et al.  Carry checking/parity prediction adders and ALUs , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[75]  Eric Schwarz,et al.  Self Checking in Current Floating-Point Units , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[76]  Dragan Gamberger,et al.  New approach to integer division in residue number systems , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.

[77]  Dana Ron,et al.  Chinese remaindering with errors , 2000, IEEE Trans. Inf. Theory.

[78]  Rajendra S. Katti,et al.  A New Residue Arithmetic Error Correction Scheme , 1996, IEEE Trans. Computers.

[79]  Meeta Sharma Gupta,et al.  DeCoR: A Delayed Commit and Rollback mechanism for handling inductive noise in processors , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[80]  A. Wang,et al.  Modeling and sizing for minimum energy operation in subthreshold circuits , 2005, IEEE Journal of Solid-State Circuits.

[81]  Kaushik Roy,et al.  A Novel Low Overhead Fault Tolerant Kogge-Stone Adder Using Adaptive Clocking , 2008, 2008 Design, Automation and Test in Europe.

[82]  Sergio Lopez-Buedo,et al.  RNS-enabled digital signal processor design , 2002 .

[83]  Chip-Hong Chang,et al.  A new algorithm for single residue digit error correction in Redundant Residue Number System , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[84]  Asif Islam Khan,et al.  Negative Capacitance in Short-Channel FinFETs Externally Connected to an Epitaxial Ferroelectric Capacitor , 2016, IEEE Electron Device Letters.

[85]  David A. Padua,et al.  Estimating cache misses and locality using stack distances , 2003, ICS '03.

[86]  Hao-Yung Lo,et al.  Parallel Algorithms for Residue Scaling and Error Correction in Residue Arithmetic , 2013 .

[87]  Dmitri E. Nikonov,et al.  Overview of Beyond-CMOS Devices and a Uniform Methodology for Their Benchmarking , 2013, Proceedings of the IEEE.

[88]  Barry W. Johnson,et al.  Efficient use of time and hardware redundancy for concurrent error detection in a 32-bit VLSI adder , 1988 .

[89]  Jaejin Lee,et al.  Using prime numbers for cache indexing to eliminate conflict misses , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[90]  S. Datta,et al.  Can the subthreshold swing in a classical FET be lowered below 60 mV/decade? , 2008, 2008 IEEE International Electron Devices Meeting.

[91]  B. Ramakrishna Rau,et al.  Pseudo-randomly interleaved memory , 1991, ISCA '91.

[92]  Jung Ho Ahn,et al.  CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[93]  K. Steinhubl Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .

[94]  K.-Y. Lin,et al.  A superfast algorithm for single-error correction in rrns and hardware implementation , 1993, J. VLSI Signal Process..

[95]  Osnat Keren,et al.  Arbitrary Error Detection in Combinational Circuits by Using Partitioning , 2008, 2008 IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems.

[96]  Juan E. Navarro,et al.  Practical, transparent operating system support for superpages , 2002, OSDI '02.

[97]  Mohammad Umar Siddiqi,et al.  Multiple error detection and correction based on redundant residue number systems , 2008, IEEE Transactions on Communications.

[98]  Mark D. Hill,et al.  Surpassing the TLB performance of superpages with less operating system support , 1994, ASPLOS VI.

[99]  E. E. Swartzlander,et al.  Time redundant error correcting adders and multipliers , 1992, Proceedings 1992 IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems.

[100]  Stephen S. Yau,et al.  Error Correction in Redundant Residue Number Systems , 1973, IEEE Trans. Computers.

[101]  O. Antoine,et al.  Theory of Error-correcting Codes , 2022 .

[102]  P. V. Ananda Mohan,et al.  RNS-To-Binary Converter for a New Three-Moduli Set $\{2^{{n}+1}-1,2^{n},2^{n}-1\}$ , 2007, IEEE Transactions on Circuits and Systems II: Express Briefs.

[103]  Michael Gössel,et al.  New Self-checking Output-Duplicated Booth Multiplier with High Fault Coverage for Soft Errors , 2005, 14th Asian Test Symposium (ATS'05).

[104]  Julien Eynard,et al.  Multi-fault Attack Detection for RNS Cryptographic Architecture , 2016, 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH).

[105]  Jean-Michel Muller,et al.  A New Euclidean Division Algorithm for Residue Number Systems , 1998, J. VLSI Signal Process..

[106]  Chip-Hong Chang,et al.  Residue Number Systems: A New Paradigm to Datapath Optimization for Low-Power and High-Performance Digital Signal Processing Applications , 2015, IEEE Circuits and Systems Magazine.

[107]  W. K. Jenkins,et al.  Redundant residue number systems for error detection and correction in digital filters , 1980 .

[108]  Thomas N. Theis (Keynote) In Quest of a Fast, Low-Voltage Digital Switch , 2012 .

[109]  Michael Nicolaidis,et al.  Design of fault-secure parity-prediction Booth multipliers , 1998, Proceedings Design, Automation and Test in Europe.

[110]  J. Mathew,et al.  Multiple Bit Error Detection and Correction in GF Arithmetic Circuits , 2010, 2010 International Symposium on Electronic System Design.