A Distributed Controller for Managing Speculative Functional Units in High Level Synthesis

Speculative functional units (SFUs) are arithmetic functional units that operate using a predictor for the carry signal. The carry prediction helps to shorten the critical path of the functional unit. The average case performance of these units is determined by the hit rate of the prediction. In case of mispredictions, the SFUs need to be coordinated by the datapath control mechanism to perform corrections and to maintain the datapath in the correct state. Devising a control mechanism for correcting mispredictions without adversely impacting overall performance is the most important challenge. In this paper, we present techniques for designing a datapath controller for seamless deployment of SFUs in high level synthesis. We have developed two techniques based on two main control paradigms: centralized and distributed control. The centralized approach stops the execution of the entire datapath for each misprediction and resumes execution once the correct value of the carry is known. The distributed approach decouples the functional unit suffering from the misprediction from the rest of the datapath. Hence, it allows the remainder of the functional units to carry on execution and be at different scheduling states at different times. We tested datapaths utilizing both linear structures and logarithmic structures for speculative arithmetic functional units. Our results show that it is possible to reduce execution time by as much as 38% (33% on average) for linear structures and by as much as 37.2% (25% on average) for logarithmic structures.

[1]  M. Charrier,et al.  JPEG2000, the next millennium compression standard for still images , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[2]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[3]  Michael J. Flynn,et al.  Computer Organization and Architecture , 1978, Advanced Course: Operating Systems.

[4]  David Bañeres,et al.  Variable-latency design by function speculation , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[5]  Román Hermida,et al.  Applying speculation techniques to implement functional units , 2008, 2008 IEEE International Conference on Computer Design.

[6]  Silvia M. Müller,et al.  On the scheduling of variable latency functional units , 1999, SPAA '99.

[7]  Pierre G. Paulin,et al.  Force-directed scheduling for the behavioral synthesis of ASICs , 1989, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[8]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[9]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[10]  Kiyoung Choi,et al.  Performance-driven high-level synthesis with bit-level chaining andclock selection , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[11]  Maryline Charrier,et al.  JPEG2000, the Next Millennium Compression Standard for Still Images , 1999, ICMCS, Vol. 1.

[12]  Kai Wang,et al.  Highly accurate data value prediction , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[13]  Robert A. Walker,et al.  Introduction to the Scheduling Problem , 1995, IEEE Des. Test Comput..

[14]  Oliver R. Hinton,et al.  Adder methodology and design using probabilistic multiple carry estimates , 2005 .

[15]  Alessandro Cilardo A new speculative addition architecture suitable for two's complement operations , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[16]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .

[17]  Yu-Chin Hsu,et al.  Scheduling for functional pipelining and loop winding , 1991, 28th ACM/IEEE Design Automation Conference.

[18]  Paolo Ienne,et al.  Variable Latency Speculative Addition: A New Paradigm for Arithmetic Circuit Design , 2008, 2008 Design, Automation and Test in Europe.

[19]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[20]  Kiyoung Choi,et al.  Low power high level synthesis by increasing data correlation , 1997, ISLPED '97.

[21]  Majid Sarrafzadeh,et al.  Low-power driven scheduling and binding , 1998, Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222).

[22]  Luca Benini,et al.  Telescopic units: a new paradigm for performance optimization of VLSI designs , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[23]  Román Hermida,et al.  Using Speculative Functional Units in high level synthesis , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[24]  Chih-Chieh Lee,et al.  Correlation and Aliasing in Dynamic Branch Predictors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[25]  Srivaths Ravi,et al.  Integrating variable-latency components into high-level synthesis , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[26]  Oliver R. Hinton,et al.  Probabilistic carry state estimate for improved asynchronous adder performance , 2001 .

[27]  Ching-Chuen Jong,et al.  A look-ahead synthesis technique with backtracking for switching activity reduction in low power high-level synthesis , 2007, Microelectron. J..

[28]  Román Hermida,et al.  Bitwise scheduling to balance the computational cost of behavioral specifications , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[29]  Trevor N. Mudge,et al.  Correlation and Aliasing in Dynamic Branch Predictors , 1996, ISCA.