Near-Optimal Microprocessor and Accelerators Codesign with Latency and Throughput Constraints

A systematic methodology for near-optimal software/hardware codesign mapping onto an FPGA platform with microprocessor and HW accelerators is proposed. The mapping steps deal with the inter-organization, the foreground memory management, and the datapath mapping. A step is described by parameters and equations combined in a scalable template. Mapping decisions are propagated as design constraints to prune suboptimal options in next steps. Several performance-area Pareto points are produced by instantiating the parameters. To evaluate our methodology we map a real-time bio-imaging application and loop-dominated benchmarks.

[1]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[2]  Monica S. Lam,et al.  RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .

[3]  Vittorio Zaccaria,et al.  Multi-objective design space exploration of embedded systems , 2003, J. Embed. Comput..

[4]  Keith D. Cooper,et al.  Operator strength reduction , 2001, TOPL.

[5]  Olivier Sentieys,et al.  A Framework for High Level Estimations of Signal Processing VLSI Implementations , 2000, J. VLSI Signal Process..

[6]  Liesbet Van der Perre,et al.  Ultra low energy Domain Specific Instruction-set Processor for on-line surveillance , 2010, 2010 IEEE 8th Symposium on Application Specific Processors (SASP).

[7]  Muhammad Shahzad,et al.  Image Coprocessor: A Real-Time Approach Towards Object Tracking , 2009, 2009 International Conference on Digital Image Processing.

[8]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[9]  Nikil D. Dutt,et al.  A hypergraph-based model for port allocation on multiple-register-file VLIW architectures , 1995, International Journal of Parallel Programming.

[10]  Frank Vahid,et al.  Scalable object detection accelerators on FPGAs using custom design space exploration , 2011, 2011 IEEE 9th Symposium on Application Specific Processors (SASP).

[11]  Lech Józwiak,et al.  Multi-objective Optimal Controller Synthesis for Heterogeneous Embedded Systems , 2006, 2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[12]  Frank Vahid,et al.  Making good points: application-specific pareto-point generation for design space exploration using statistical methods , 2009, FPGA '09.

[13]  David Ryan Koes,et al.  Near-optimal instruction selection on dags , 2008, CGO '08.

[14]  Frank Vahid,et al.  SpecSyn: an environment supporting the specify-explore-refine paradigm for hardware/software system design , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[15]  George Theodoridis,et al.  The ARISE Approach for Extending Embedded Processors With Arbitrary Hardware Accelerators , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  Peter Pirsch,et al.  Mapping of a Real-Time Object Detection Application onto a Configurable RISC/Coprocessor Architecture at Full HD Resolution , 2010, 2010 International Conference on Reconfigurable Computing and FPGAs.

[17]  George Kornaros A soft multi-core architecture for edge detection and data analysis of microarray images , 2010, J. Syst. Archit..

[18]  Franz Franchetti,et al.  Computer Generation of Hardware for Linear Digital Signal Processing Transforms , 2012, TODE.

[19]  Walid A. Najjar,et al.  Efficient hardware code generation for FPGAs , 2008, TACO.

[20]  Yunheung Paek,et al.  Improving performance of nested loops on reconfigurable array processors , 2012, TACO.

[21]  Francky Catthoor,et al.  A systematic approach to classify design-time global scheduling techniques , 2013, CSUR.

[22]  Gianluca Palermo,et al.  An Evolutionary Approach to Area-Time Optimization of FPGA designs , 2007, 2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[23]  Holger Blume,et al.  Design flow for embedded FPGAs based on a flexible architecture template , 2008, 2008 Design, Automation and Test in Europe.

[24]  Vivek Sarkar,et al.  Linear scan register allocation , 1999, TOPL.

[25]  John Wawrzynek,et al.  The Garp Architecture and C Compiler , 2000, Computer.

[26]  George Athanasiou,et al.  A template-based methodology for efficient microprocessor and FPGA accelerator co-design , 2012, 2012 International Conference on Embedded Computer Systems (SAMOS).

[27]  Scott Hauck,et al.  Reconfigurable computing: a survey of systems and software , 2002, CSUR.

[28]  Kiyoung Choi,et al.  SoCDAL: System-on-chip design AcceLerator , 2008, TODE.

[29]  R. Kumar,et al.  Application-Specific Customization of Parameterized FPGA Soft-Core Processors , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[30]  Manoel Eusebio de Lima,et al.  A left-edge algorithm approach for scheduling and allocation of hardware contexts in dynamically reconfigurable architectures , 2004, FPGA '04.

[31]  Luca Benini,et al.  Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications , 2012, DAC Design Automation Conference 2012.

[32]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .

[33]  SilvanoCristina,et al.  Multi-objective design space exploration of embedded systems , 2005 .

[34]  Wayne Luk,et al.  CUSTARD - a customisable threaded FPGA soft processor and tools , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[35]  Tulika Mitra,et al.  A Model for Hardware Realization of Kernel Loops , 2003, FPL.