Automated instruction-set extension of embedded processors with application to MPEG-4 video encoding

A recent approach to platform-based design involves the use of extensible processors, offering architecture customization possibilities. Part of the designer responsibilities is the domain-specific extension of the baseline processor to fit customer requirements. Key issues of this process are the automated application analysis and candidate instruction identification/selection for implementation as application-specific functional units (AFUs). In this paper, a design approach that encapsulates automated workload characterization and instruction generation is utilized for extending processors to efficiently support embedded application sets. The method used for instruction generation is a highly parameterized adaptation of the MaxMISO technique, which allows for fast design space exploration. It is proven that only a small number of AFUs are needed in order to support the algorithms of interest (MPEG-4 encoding kernels) and that it is possible to achieve 2/spl times/ to 3.5/spl times/ performance improvements although further possibilities such as subword parallelization are not currently regarded.

[1]  Rodolfo Azevedo,et al.  Fast instruction set customization , 2004, 2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004..

[2]  Scott A. Mahlke,et al.  High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[3]  Darin Petkov,et al.  Automatic generation of application specific processors , 2003, CASES '03.

[4]  Lai-Man Po,et al.  A novel four-step search algorithm for fast block motion estimation , 1996, IEEE Trans. Circuits Syst. Video Technol..

[5]  Stamatis Vassiliadis,et al.  Interlock Collapsing ALU's , 1993, IEEE Trans. Computers.

[6]  Jo Yew Tham,et al.  A novel unrestricted center-biased diamond search algorithm for block motion estimation , 1998, IEEE Trans. Circuits Syst. Video Technol..

[7]  BurgerDoug,et al.  The SimpleScalar tool set, version 2.0 , 1997 .

[8]  Paolo Ienne,et al.  Automatic topology-based identification of instruction-set extensions for embedded processors , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[9]  Peter Kuhn,et al.  Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation , 1999, Springer US.

[10]  Rodolfo Azevedo,et al.  Looking for Instruction Patterns in the Design of Extensible Processors , 2004 .

[11]  Scott A. Mahlke,et al.  PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..

[12]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[13]  Scott A. Mahlke,et al.  Bitwidth cognizant architecture synthesis of custom hardwareaccelerators , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  Spiridon Nikolaidis,et al.  Application Analysis with Integrated Identification of Complex Instructions for Configurable Processors , 2004, PATMOS.

[15]  Robert M. Gray,et al.  An Improvement of the Minimum Distortion Encoding Algorithm for Vector Quantization , 1985, IEEE Trans. Commun..

[16]  Srivaths Ravi,et al.  Custom-instruction synthesis for extensible-processor platforms , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  T Koga,et al.  MOTION COMPENSATED INTER-FRAME CODING FOR VIDEO CONFERENCING , 1981 .

[18]  Seth Copen Goldstein,et al.  BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations , 2000, Euro-Par.

[19]  Nikil D. Dutt,et al.  Introduction of local memory elements in instruction set extensions , 2004, Proceedings. 41st Design Automation Conference, 2004..

[20]  Scott A. Mahlke,et al.  Automatic Design of Application Specific Instruction Set Extensions Through Dataflow Graph Exploration , 2004, International Journal of Parallel Programming.

[21]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[22]  Lurng-Kuo Liu,et al.  A block-based gradient descent search algorithm for block motion estimation in video coding , 1996, IEEE Trans. Circuits Syst. Video Technol..

[23]  Michael Gschwind,et al.  Instruction set selection for ASIP design , 1999, Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450).

[24]  Tulika Mitra,et al.  Characterizing embedded applications for instruction-set extensible processors , 2004, Proceedings. 41st Design Automation Conference, 2004..

[25]  Sharad Malik,et al.  From ASIC to ASIP: the next design discontinuity , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[26]  Bing Zeng,et al.  A new three-step search algorithm for block motion estimation , 1994, IEEE Trans. Circuits Syst. Video Technol..

[27]  Robert L. Bernstein Multiplication by integer constants , 1986, Softw. Pract. Exp..

[28]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.

[29]  Bede Liu,et al.  New fast algorithms for the estimation of block motion vectors , 1993, IEEE Trans. Circuits Syst. Video Technol..