Exploring the speedups of embedded microprocessor systems utilizing a high-performance coprocessor data-path

Abstract The speedups achieved in a generic microprocessor system by employing a high-performance data-path are presented. The data-path acts as a coprocessor that accelerates time critical code segments, called kernels, thereby increasing the overall performance. The data-path has been previously introduced by the authors and it is composed by Flexible Computational Components (FCCs) that can realize any two-level template of primitive operations. A design flow, integrating the automated coprocessor synthesis method, for executing applications on the system is presented. For evaluating the effectiveness of our coprocessor approach, analytical exploration in respect to the type of the custom data-path and to the microprocessor architecture is performed. The kernel and the overall application speedups of six real-life applications, relative to the software execution on the microprocessor, are estimated using the design flow. Kernel speedups up to 155 are achieved that result in an average overall improvement of 2.78 with a small overhead in circuit area. The design flow achieved the acceleration of the applications near to theoretical bounds. A comparison with another high-performance data-path showed that the proposed coprocessor achieves better performance while having smaller area-time products for the generated data-paths.

[1]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[2]  Richard Taylor,et al.  Co-processor synthesis: a new methodology for embedded software acceleration , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[3]  M. Bister,et al.  Automated segmentation of cardiac MR images , 1989, [1989] Proceedings. Computers in Cardiology.

[4]  Sri Parameswaran,et al.  INSIDE: INstruction Selection/Identification & Design Exploration for extensible processors , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[5]  Scott A. Mahlke,et al.  PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..

[6]  Jason Cong,et al.  Application-specific instruction generation for configurable processor architectures , 2004, FPGA '04.

[7]  Spyros Tragoudas,et al.  A high-performance data path for synthesizing DSP kernels , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  S. Kumar,et al.  A benchmark suite for evaluating configurable computing systems—status, reflections, and future directions , 2000, FPGA '00.

[9]  Sri Parameswaran,et al.  Novel architecture for loop acceleration: a case study , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[10]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[11]  Frank Vahid,et al.  Hardware/software partitioning of software binaries: a case study of H.264 decode , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[12]  Werner Geurts Accelerator Data-Path Synthesis for High-Throughput Signal Processing Applications , 1996 .

[13]  Miodrag Potkonjak,et al.  Performance optimization using template mapping for datapath-intensive high-level synthesis , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  Srivaths Ravi,et al.  Synthesis of custom processors based on extensible platforms , 2002, ICCAD 2002.

[15]  Rolf Ernst,et al.  A processor-coprocessor architecture for high end video applications , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  J. W. Crenshaw Math toolkit for real-time programming , 2000 .

[17]  Frank Vahid,et al.  Improving Software Performance with Configurable Logic , 2002, Des. Autom. Embed. Syst..

[18]  Majid Sarrafzadeh,et al.  Instruction generation for hybrid reconfigurable systems , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[19]  John Wawrzynek,et al.  The Garp Architecture and C Compiler , 2000, Computer.