Analysis and optimization of a deeply pipelined FPGA soft processor

FPGA soft processors have been shown to achieve high frequency when designed around the specific capabilities of heterogenous resources on modern FPGAs. However, such performance comes at a cost of deep pipelines, which can result in a larger number of idle cycles when executing programs with long dependency chains in the instruction sequence. We perform a full design-space exploration of a DSP block based soft processor to examine the effect of pipeline depth on frequency, area, and program runtime, noting the significant number of NOPs required to resolve dependencies. We then explore the potential of a restricted data forwarding approach in improving runtime by significantly reducing NOP padding. The result is a processor that runs close to the fabric limit of 500MHz with a case for simple data forwarding.

[1]  J. Gregory Steffan,et al.  OCTAVO: an FPGA-centric processor family , 2012, FPGA '12.

[2]  T. Puzak,et al.  The optimum pipeline depth for a microprocessor , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[3]  Kizheppatt Vipin,et al.  System-level FPGA device driver with high-level synthesis support , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[4]  Douglas L. Maskell,et al.  iDEA: A DSP block based FPGA soft processor , 2012, 2012 International Conference on Field-Programmable Technology.

[5]  Douglas L. Maskell,et al.  The iDEA DSP Block-Based Soft Processor for FPGAs , 2014, TRETS.

[6]  Nachiket Kapre,et al.  VLIW-SCORE: Beyond C for sequential control of SPICE FPGA acceleration , 2011, 2011 International Conference on Field-Programmable Technology.

[7]  Philip G. Emma,et al.  Characterization of Branch and Data Dependencies in Programs for Evaluating Pipeline Performance , 1987, IEEE Transactions on Computers.