Application-specific instruction set processor for SoC implementation of modern signal processing algorithms

A novel application-specific instruction set processor (ASIP) for use in the construction of modern signal processing systems is presented. This is a flexible device that can be used in the construction of array processor systems for the real-time implementation of functions such as singular-value decomposition (SVD) and QR decomposition (QRD), as well as other important matrix computations. It uses a coordinate rotation digital computer (CORDIC) module to perform arithmetic operations and several approaches are adopted to achieve high performance including pipelining of the micro-rotations, the use of parallel instructions and a dual-bus architecture. In addition, a novel method for scale factor correction is presented which only needs to be applied once at the end of the computation. This also reduces computation time and enhances performance. Methods are described which allow this processor to be used in reduced dimension (i.e., folded) array processor structures that allow tradeoffs between hardware and performance. The net result is a flexible matrix computational processing element (PE) whose functionality can be changed under program control for use in a wider range of scenarios than previous work. Details are presented of the results of a design study, which considers the application of this decomposition PE architecture in a combined SVD/QRD system and demonstrates that a combination of high performance and efficient silicon implementation are achievable.

[1]  J. S. Walther,et al.  A unified algorithm for elementary functions , 1899, AFIPS '71 (Spring).

[2]  Ed F. Deprettere,et al.  Floating point Cordic , 1993, Proceedings of IEEE 11th Symposium on Computer Arithmetic.

[3]  Ed F. Deprettere,et al.  Scalable parallel processor array for Jacobi-type matrix computations , 1995, Integr..

[4]  Franklin T. Luk,et al.  The Solution Of Singular Value Problems Using Systolic Arrays , 1984, Optics & Photonics.

[5]  Jack E. Volder The CORDIC Trigonometric Computing Technique , 1959, IRE Trans. Electron. Comput..

[6]  Keshab K. Parhi,et al.  A unified algebraic transformation approach for parallel recursive and adaptive filtering and SVD algorithms , 2001, IEEE Trans. Signal Process..

[7]  M. Moonen,et al.  AN SVD UPDATING ALGORITHM FOR SUBSPACE TRACKING , 1992 .

[8]  Bin Yang,et al.  Rotation-based RLS algorithms: unified derivations, numerical properties, and parallel implementations , 1992, IEEE Trans. Signal Process..

[9]  H. T. Kung,et al.  Matrix Triangularization By Systolic Arrays , 1982, Optics & Photonics.

[10]  Joseph R. Cavallaro,et al.  Floating point CORDIC for matrix computations , 1988, Proceedings 1988 IEEE International Conference on Computer Design: VLSI.

[11]  Joos Vandewalle,et al.  A Singular Value Decomposition Updating Algorithm for Subspace Tracking , 1992, SIAM J. Matrix Anal. Appl..

[12]  J. G. McWhirter,et al.  Systolic Adaptive Beamforming , 1993 .

[13]  R. Brent,et al.  Almost linear-time computation of the singular value decomposition using mesh-connected processors , 1983 .

[14]  Gaye Lightbody,et al.  Linear QR Architecture for a Single Chip Adaptive Beamformer , 2000, J. VLSI Signal Process..

[15]  Keshab K. Parhi,et al.  Hierarchical pipelining and folding of QRD-RLS adaptive filters and its application to digital beamforming , 2000 .

[16]  Shen-Fu Hsiao,et al.  Parallel singular value decomposition of complex matrices using multidimensional CORDIC algorithms , 1996, IEEE Trans. Signal Process..

[17]  R.W.M. Smith,et al.  Architectures for adaptive weight calculation on ASIC and FPGA , 1999, Conference Record of the Thirty-Third Asilomar Conference on Signals, Systems, and Computers (Cat. No.CH37020).

[18]  Franklin T. Luk,et al.  Computation Of The Generalized Singular Value Decomposition Using Mesh-Connected Processors , 1983, Optics & Photonics.

[19]  Heinrich Meyr,et al.  Design of Energy-Efficient Application-Specific Instruction Set Processors , 2004 .

[20]  Joseph R. Cavallaro,et al.  Numerical Accuracy and Hardware Tradeoffs for CORDIC Arithmetic for Special-Purpose Processors , 1993, IEEE Trans. Computers.

[21]  Joseph R. Cavallaro,et al.  CORDIC arithmetic for an SVD processor , 1987, IEEE Symposium on Computer Arithmetic.

[22]  E.F. Deprettere,et al.  An optimal floating-point pipeline CMOS CORDIC processor , 1988, 1988., IEEE International Symposium on Circuits and Systems.

[23]  G. W. Stewart,et al.  An updating algorithm for subspace tracking , 1992, IEEE Trans. Signal Process..

[24]  Gaye Lightbody,et al.  Generic SoC QR array processor for adaptive beamforming , 2003, IEEE Trans. Circuits Syst. II Express Briefs.

[25]  Ed F. Deprettere,et al.  On the derivation of parallel filter structures for adaptive eigenvalue and singular value decompositions , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[26]  Gaye Lightbody,et al.  Design of a parameterizable silicon intellectual property core for QR-based RLS filtering , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[27]  J. G. McWhirter,et al.  Recursive Least-Squares Minimization Using A Systolic Array , 1983, Optics & Photonics.