Improving Network Processing Dependability with Heterogeneous Reliability Cores

An emerging problem facing future high-performance embedded multi-core and network processors are transient faults caused by radiation, noise and other factors.These faults will likely make future multi-core processors less reliable as chip features shrink, voltages decrease, and the number of cores increase. To address this problem, we propose a systems approach of managing and allocating reliability according to software process requirements. The heterogeneous multi-core architecture proposed is based on cores with differing reliabilities. Critical and non-critical software components are segregated and matched with the higher and lower reliability cores, respectively. This method is applied to a network processing example and we show that by using heterogeneous reliability cores the overall system failure rate can be reduced significantly, while offering the same or better overall performance, power utilization and chip area as symmetric cores.

[1]  Lei Zhang,et al.  Fault tolerance mechanism in chip many-core processors , 2007 .

[2]  Paolo Bernardi,et al.  A Hybrid Approach to Fault Detection and Correction in SoCs , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).

[3]  Koushik Chakraborty,et al.  Adapting to Intermittent Faults in Future Multicore Systems , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[4]  Huiyang Zhou,et al.  A case for fault tolerance and performance enhancement using chip multi-processors , 2006, IEEE Computer Architecture Letters.

[5]  Jinuk Luke Shin,et al.  The UltraSPARC T1 Processor: CMT Reliability , 2006, IEEE Custom Integrated Circuits Conference 2006.

[6]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[7]  Nur A. Touba,et al.  Partial error masking to reduce soft error failure rate in logic circuits , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[8]  Radu Marculescu Networks-on-chip: the quest for on-chip fault-tolerant communication , 2003, IEEE Computer Society Annual Symposium on VLSI, 2003. Proceedings..

[9]  Wenbin Yao,et al.  Fault-Tolerance CMP Architecture based on SMT Technology , 2007, Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007).

[10]  M. Schunter,et al.  Architecting Dependable Systems Using Virtualization , 2007 .

[11]  Lin Chuang,et al.  Handling High Speed Traffic Measurement Using Network Processors , 2006, 2006 International Conference on Communication Technology.

[12]  James E. Smith,et al.  Configurable isolation: building high availability systems with commodity multi-core processors , 2007, ISCA '07.

[13]  Norman P. Jouppi,et al.  Core architecture optimization for heterogeneous chip multiprocessors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  Chita R. Das,et al.  Exploring Fault-Tolerant Network-on-Chip Architectures , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[15]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[16]  C. Evans-Pughe Live fast, die young [nanometer-scale IC life expectancy] , 2004 .

[17]  Frank T.-C. Tsai,et al.  Ensemble Subsurface Modeling Using Grid Computing Technology , 2007 .