A data-driven energy efficient and flexible compute fabric architecture: For adaptive computing applied to ULSI of FFT

In this work, we investigate architectures that can provide the benefits of dedicated hardware implementations and the flexibility of software defined environments. We call this new approach a data defined environment, in which hardware and software scales together based on workload variability to provide state-of-the-art hardware energy-efficiency. An integrated architecture for rapidly implementing efficient large-scale Digital Signal Processing (DSP) functions is presented. The target DSP functions are represented by an application space with one or more dimensions and several ensembles of Adaptive Computing Fabrics (ACF). It is shown that the proposed fabric allows achieving deterministic performance exceeding 245GOPs/mW for data workload characterized by high dynamic variability such as FFT of various size including 64, 128, 512, 1,024, 2,048, 4,098, 8,192, 262,144 and 746,496. Experimental results show improvements on the order of 1000× in power efficiency when compared to published alternatives for several applications, including H.265 High-Efficiency Video Coding (HEVC), Bluetooth, LTE, xDSL/DVB, WLAN and mm-Waves Wireless Personal Networks (WPAN).

[1]  Rajiv K. Kalia,et al.  Performance Characteristics of Hardware Transactional Memory for Molecular Dynamics Application on BlueGene/Q: Toward Efficient Multithreading Strategies for Large-Scale Scientific Applications , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[2]  Alessandro Pellegrini,et al.  Adaptive Transactional Memories: Performance and Energy Consumption Tradeoffs , 2014, 2014 IEEE 3rd Symposium on Network Cloud Computing and Applications (ncca 2014).

[3]  Ian H. Witten,et al.  The Fastest Fourier Transform in the South , 2013, IEEE Transactions on Signal Processing.

[4]  Robert H. Dennard,et al.  A perspective on today’s scaling challenges and possible future directions , 2007 .

[5]  Yvon Savaria,et al.  Real-Time Computation of Local Neighborhood Functions in Application-Specific Instruction-Set Processors , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  Mark Horowitz,et al.  CPU DB: Recording Microprocessor History , 2012, ACM Queue.

[7]  Yvon Savaria,et al.  A method to derive application-specific embedded processing cores , 2000, Proceedings of the Eighth International Workshop on Hardware/Software Codesign. CODES 2000 (IEEE Cat. No.00TH8518).

[8]  Yvon Savaria,et al.  Adaptive real-time DSP acceleration for SoC applications , 2014, 2014 IEEE 57th International Midwest Symposium on Circuits and Systems (MWSCAS).

[9]  Rainer Leupers,et al.  CoEx: A novel profiling-based algorithm/architecture co-exploration for ASIP design , 2013, 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[10]  Yvon Savaria,et al.  A comparison of automatic word length optimization procedures , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[11]  Danijela Cabric,et al.  A 7.4mW 200MS/s wideband spectrum sensing digital baseband processor for cognitive radios , 2011, 2011 Symposium on VLSI Circuits - Digest of Technical Papers.