Pipeline Reconfigurable DSP for Dynamically Reconfigurable Architectures

Dynamically reconfigurable architectures, such as NATURE, achieve high logic density and low reconfiguration latency compared to traditional field-programmable gate arrays. Unlike fine-grained NATURE, reconfigurable DSP block incorporated NATURE architecture achieves significant improvement in performance for mapping compute-intensive arithmetic operations. However, the DSP block fails to fully exploit the potential provided by the run-time reconfiguration. This paper presents a pipeline reconfigurable DSP architecture to target the NATURE platform that supports temporal logic folding. The proposed approach allows the DSP pipeline stages to be reconfigured independently such that different functions can be performed distinctively and individually at every clock interval during runtime. In addition, a multistage clock gating technique is also used in the design to minimize the power consumption. We also extend NanoMap tool for mapping circuits on NATURE platform to exploit the pipeline-level reconfigurability of our proposed DSP block to enable efficient resource sharing and area/power reduction. Simulation results on 13 benchmarks show that the proposed approach enables area-delay improvement of up to 3.6$$\times $$× compared to the fine-grained NATURE architecture. The proposed architecture also delivers 31.42% reduction in area and a maximum of 4.18$$\times $$× improvement in power-delay compared to existing NATURE architecture. We also observe an average improvement of 29 and 54.13% in performance and area when compared to commercial Xilinx Spartan-3A DSP platform, thereby allowing the designers to tune the circuit implementations for the area, power, or performance benefits.

[1]  Kizheppatt Vipin,et al.  Architecture-Aware Reconfiguration-Centric Floorplanning for Partial Reconfiguration , 2012, ARC.

[2]  Aaml Fons Bruekers Symmetry and efficiency in complex FIR filters , 2009 .

[3]  Juanjo Noguera,et al.  Fast dynamic and partial reconfiguration data path with low hardware overhead on Xilinx FPGAs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[4]  A. Prasad Vinod,et al.  Flexible Low Complexity Uniform and Nonuniform Digital Filter Banks With High Frequency Resolution for Multistandard Radios , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Wei Zhang,et al.  Reconfigurable DSP block design for dynamically reconfigurable architecture , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[6]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[7]  Wei Zhang,et al.  FDR 2.0: A Low-Power Dynamically Reconfigurable Architecture and Its FinFET Implementation , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Masanori Hariyama,et al.  Architecture of a multi-context FPGA using reconfigurable context memory , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[9]  Pierre G. Paulin,et al.  Force-directed scheduling for the behavioral synthesis of ASICs , 1989, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[10]  Vasily G. Moshnyaga Reducing Switching Activity of Subtraction via Variable Truncation of the Most-Significant Bits , 2003, J. VLSI Signal Process..

[11]  Wei Zhang,et al.  Design space exploration and data memory architecture design for a hybrid nano/CMOS dynamically reconfigurable architecture , 2009, JETC.

[12]  Wei Zhang,et al.  A Fine-Grain Dynamically Reconfigurable Architecture Aimed at Reducing the FPGA-ASIC Gaps , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Seth Copen Goldstein,et al.  Managing pipeline-reconfigurable FPGAs , 1998, FPGA '98.

[14]  Wei Zhang,et al.  A hybrid Nano/CMOS dynamically reconfigurable system—Part II: Design optimization flow , 2009, JETC.

[15]  A. Prasad Vinod,et al.  Design of Reconfigurable Filter Bank Architecture Using Improved Coefficient Decimation-Interpolation-Masking Technique for Multi-Standard Wireless Communication Receivers , 2014, J. Low Power Electron..

[16]  Kizheppatt Vipin,et al.  Automated Partitioning for Partial Reconfiguration Design of Adaptive Systems , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[17]  Seth Copen Goldstein,et al.  PipeRench: A Reconfigurable Architecture and Compiler , 2000, Computer.

[18]  Harish M. Kittur,et al.  Low power energy efficient pipelined multiply-accumulate architecture , 2012, ICACCI '12.

[19]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[20]  André DeHon,et al.  Dynamically Programmable Gate Arrays: A Step Toward Increased Computational Density , 1996 .

[21]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[22]  Earl E. Swartzlander,et al.  Power-delay characteristics of CMOS multipliers , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.

[23]  Wei Zhang,et al.  Low-power 3D nano/CMOS hybrid dynamically reconfigurable architecture , 2010, JETC.

[24]  Jon M. Slaughter,et al.  Magnetoresistive random access memory using magnetic tunnel junctions , 2003, Proc. IEEE.

[25]  Tung Thanh Hoang,et al.  High-speed, energy-efficient 2-cycle Multiply-Accumulate architecture , 2009, 2009 IEEE International SOC Conference (SOCC).

[26]  Niraj K. Jha,et al.  Hierarchical test generation and design for testability methods for ASPPs and ASIPs , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[27]  A. Prasad Vinod,et al.  New Reconfigurable Architectures for Implementing FIR Filters With Low Complexity , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[28]  Wei Zhang,et al.  A low-power pipelined MAC architecture using Baugh-Wooley based multiplier , 2014, 2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE).

[29]  Wei Zhang,et al.  A hybrid nano/CMOS dynamically reconfigurable system—Part I: Architecture , 2009, JETC.

[30]  Seung Eun Lee,et al.  A variable frequency link for a power-aware network-on-chip (NoC) , 2009, Integr..

[31]  Srivaths Ravi,et al.  Satisfiability-based test generation for nonseparable RTL controller-datapath circuits , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[32]  Raul V. Fabella,et al.  Symmetry and Efficiency , 1997 .

[33]  Arun K. Somani,et al.  Configuration caching vs data caching for striped FPGAs , 1999, FPGA '99.

[34]  Magnus Själander,et al.  High-speed and low-power multipliers using the Baugh-Wooley algorithm and HPM reduction tree , 2008, 2008 15th IEEE International Conference on Electronics, Circuits and Systems.

[35]  Stephen M. Scalera,et al.  The design and implementation of a context switching FPGA , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).