A High Performance Heterogeneous Architecture and Its Optimization Design

The widely adoption of media processing applications provides great challenges to high performance embedded processor design. This paper studies a Data Parallel Coprocessor architecture based on SDTA and architecture de-cisions are made for the best performance/cost ratio. Experimental results on a prototype show that SDTA has high performance to run many embedded media processing applications. The simplicity and flexibility of SDTA encourages for further development for its reconfigurable functionality.

[1]  Janak H. Patel,et al.  Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.

[2]  Jack E. Volder The CORDIC Trigonometric Computing Technique , 1959, IRE Trans. Electron. Comput..

[3]  Janak H. Patel,et al.  Stride directed prefetching in scalar processors , 1992, MICRO 1992.

[4]  Jason Fritts Multi-level memory prefetching for media and stream processing , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[5]  Henk Corporaal,et al.  MOVE: a framework for high-performance processor design , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[6]  Henk Corporaal,et al.  Code generation for transport triggered architectures , 1994, Code Generation for Embedded Processors.

[7]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[8]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[9]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[10]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[11]  Anshul Kumar,et al.  ASIP design methodologies: survey and issues , 2001, VLSI Design 2001. Fourteenth International Conference on VLSI Design.

[12]  Michael J. Flynn,et al.  A comparison of hardware prefetching techniques for multimedia benchmarks , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[13]  Michael W. Berry,et al.  Scientific workload characterization by loop-based analyses , 1992, PERV.

[14]  Jason E. Fritts,et al.  MediaBench II video: expediting the next generation of video systems research , 2005, IS&T/SPIE Electronic Imaging.