Speedups in embedded systems with a high-performance coprocessor datapath

This article presents the speedups achieved in a generic single-chip microprocessor system by employing a high-performance datapath. The datapath acts as a coprocessor that accelerates computational-intensive kernel sections thereby increasing the overall performance. We have previously introduced the datapath which is composed of Flexible Computational Components (FCCs). These components can realize any two-level template of primitive operations. The automated coprocessor synthesis method from high-level software description and its integration to a design flow for executing applications on the system is presented. For evaluating the effectiveness of our coprocessor approach, analytical study in respect to the type of the custom datapath and to the microprocessor architecture is performed. The overall application speedups of several real-life applications relative to the software execution on the microprocessor are estimated using the design flow. These speedups range from 1.75 to 5.84, with an average value of 3.04, while the overhead in circuit area is small. The design flow achieved the acceleration of the applications near to theoretical speedup bounds. A comparison with another high-performance datapath showed that the proposed coprocessor achieves smaller area-time products by an average of 23% for the generated datapaths. Additionally, the FCC coprocessor achieves better performance in accelerating kernels relative to software-programmable DSP cores.

[1]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[2]  David Jones High performance , 1989, Nature.

[3]  Stamatis Vassiliadis,et al.  A Reconfigurable Functional Unit for TriMedia/CPU64. A Case Study , 2002, Embedded Processor Design Challenges.

[4]  Rudy Lauwereins,et al.  Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: a case study , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[5]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[6]  Scott A. Mahlke,et al.  High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[7]  Frank Vahid,et al.  Hardware/software partitioning of software binaries: a case study of H.264 decode , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[8]  Carl Ebeling,et al.  Implementing an OFDM receiver on the RaPiD reconfigurable architecture , 2003, IEEE Transactions on Computers.

[9]  S. Kumar,et al.  A benchmark suite for evaluating configurable computing systems—status, reflections, and future directions , 2000, FPGA '00.

[10]  Werner Geurts Accelerator Data-Path Synthesis for High-Throughput Signal Processing Applications , 1996 .

[11]  Peter Marwedel,et al.  Built-in chaining: introducing complex components into architectural synthesis , 1997, Proceedings of ASP-DAC '97: Asia and South Pacific Design Automation Conference.

[12]  Spyros Tragoudas,et al.  A high-performance data path for synthesizing DSP kernels , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Richard Taylor,et al.  Co-processor synthesis: a new methodology for embedded software acceleration , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[14]  TragoudasSpyros,et al.  Speedups in embedded systems with a high-performance coprocessor datapath , 2008 .

[15]  Jason Cong,et al.  Application-specific instruction generation for configurable processor architectures , 2004, FPGA '04.

[16]  Sri Parameswaran,et al.  Novel architecture for loop acceleration: a case study , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[17]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[18]  Scott A. Mahlke,et al.  PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..

[19]  Majid Sarrafzadeh,et al.  Instruction generation for hybrid reconfigurable systems , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[20]  John Wawrzynek,et al.  The Garp Architecture and C Compiler , 2000, Computer.

[21]  Rolf Ernst,et al.  A processor-coprocessor architecture for high end video applications , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  J. W. Crenshaw Math toolkit for real-time programming , 2000 .

[23]  Paolo Ienne,et al.  Automatic application-specific instruction-set extensions under microarchitectural constraints , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[24]  Matt Marcus Jpeg Image Compression , .

[25]  Srivaths Ravi,et al.  Synthesis of custom processors based on extensible platforms , 2002, ICCAD 2002.

[26]  Carl Ebeling,et al.  Implementing an OFDM Receiver on the RaPiD Reconfigurable Architecture , 2004, IEEE Trans. Computers.

[27]  Miodrag Potkonjak,et al.  Performance optimization using template mapping for datapath-intensive high-level synthesis , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[28]  Frank Vahid,et al.  Improving Software Performance with Configurable Logic , 2002, Des. Autom. Embed. Syst..

[29]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[30]  Hai Zhou,et al.  Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems , 2010 .

[31]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[32]  M. Bister,et al.  Automated segmentation of cardiac MR images , 1989, [1989] Proceedings. Computers in Cardiology.

[33]  Sri Parameswaran,et al.  INSIDE: INstruction Selection/Identification & Design Exploration for extensible processors , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).