Distributed Comparison Test Driven Multiprocessor Speed-Tuning: Targeting Performance Gains under Extreme Process Variations

Exhaustive speed testing of all the cores under extreme inter and intra-die process variations in a large chip multi processor (CMP) is expensive in terms of test time and may not guarantee full CMP functionality due to lack of coverage of timing failures induced by second-order effects such as cross talk, power/ground bounce and speed-limiting design bugs that are not "caught" by relevant combinatorial design verification algorithms. The goal of this research is to develop a methodology that allows the "safe" speed of each core in a large CMP to be determined under the assumption that some speed defects and design bugs are likely to escape conventional delay testing procedures. Accordingly, baseline speeds using conventional tests are determined for each CMP core using a comparison based speed-tuning algorithm. To prevent "blue screens" from any test escapes, relevant applications are then run on the CMP in "fail-safe/redundant" mode to "top-up" speed-defect coverage. Over a period of time, using a concurrent tuning algorithm, the true "safe speeds' of all the cores are determined in O(log(Fp)) steps, independent of the size of the array, where Fp is the number of discrete clock speeds possible. Subsequently, each core is run "independently" at its highest "safe' clock speed achieving maximum possible CMP performance.

[1]  Keith Baker,et al.  Shmoo plotting: the black art of IC testing , 1996, Proceedings International Test Conference 1996. Test and Design Validity.

[2]  Keith A. Bowman,et al.  Impact of die-to-die and within-die parameter variations on the throughput distribution of multi-core processors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[3]  Don Douglas Josephson The manic depression of microprocessor debug , 2002, Proceedings. International Test Conference.

[4]  Babak Falsafi,et al.  Fingerprinting: bounding soft-error-detection latency and bandwidth , 2004, IEEE Micro.

[5]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[6]  Subhasish Mitra,et al.  CASP: Concurrent Autonomous Chip Self-Test Using Stored Test Patterns , 2008, 2008 Design, Automation and Test in Europe.

[7]  Doug Josephson,et al.  The good, the bad, and the ugly of silicon debug , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[8]  Subhasish Mitra,et al.  IFRA: Instruction Footprint Recording and Analysis for post-silicon bug localization in processors , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[9]  Priyadarsan Patra On the cusp of a validation wall , 2007, IEEE Design & Test of Computers.

[10]  Sarita V. Adve,et al.  SWAT : An Error Resilient System , 2008 .

[11]  Kevin Skadron,et al.  Impact of Process Variations on Multicore Performance Symmetry , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[12]  Jacob A. Abraham,et al.  Functional Testing of Microprocessors , 1984, IEEE Transactions on Computers.

[13]  Jacob A. Abraham,et al.  Automatic generation of instruction sequences targeting hard-to-detect structural faults in a processor , 2006, 2006 IEEE International Test Conference.

[14]  Jacob A. Abraham,et al.  Native mode functional self-test generation for Systems-on-Chip , 2002, Proceedings International Symposium on Quality Electronic Design.

[15]  William Lindsay,et al.  FRITS - a microprocessor functional BIST method , 2002, Proceedings. International Test Conference.

[16]  G. Singer,et al.  The first IA-64 microprocessor , 2000, IEEE Journal of Solid-State Circuits.

[17]  David Lin,et al.  QED: Quick Error Detection tests for effective post-silicon validation , 2010, 2010 IEEE International Test Conference.

[18]  F. Ashcroft,et al.  VIII. References , 1955 .

[19]  Josep Torrellas,et al.  ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors , 2002, ISCA.

[20]  Daniel G. Saab,et al.  Automatic Generation of Instructions to Robustly Test Delay Defects in Processors , 2007, 12th IEEE European Test Symposium (ETS'07).