Custom instruction for NIOS II processor FFT implementation for image processing

Image processing can be considered as signal processing in two dimensions (2D). Filtering is one of the basic image processing operation. Filtering in frequency domain is computationally faster when compared to the corresponding spatial domain operation as the complex convolution process is modified as multiplication in frequency domain. The popular 2D transforms used in image processing are Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT). The common values for resolution of an image are 640x480, 800x600, 1024x768 and 1280x1024. As it can be seen, the image formats are generally not a power of 2. So power of 2 FFT lengths are not required and these cannot be built using shorter Discrete Fourier Transform (DFT) blocks. Split radix based FFT algorithms like Good-Thomas FFT algorithm simplifies the implementation logic required for such applications and hence can be implemented in low area and power consumption and also meet the timing constraints thereby operating at high frequency. The Good-Thomas FFT algorithm which is a Prime Factor FFT algorithm (PFA) provides the means of computing DFT with least number of multiplication and addition operations. We will be providing an Altera FPGA based NIOS II custom instruction implementation of Good-Thomas FFT algorithm to improve the system performance and also provide the comparison when the same algorithm is completely implemented in software.

[1]  Uwe Meyer-Baese,et al.  Discrete wavelet transform FPGA design using MatLab/Simulink , 2006, SPIE Defense + Commercial Sensing.

[2]  Antonio García,et al.  Quantization analysis and enhancement of a VLSI gradient-based motion estimation architecture , 2012, Digit. Signal Process..

[3]  Fred J. Taylor,et al.  RNS implementation of FIR filters based on distributed arithmetic using field-programmable logic , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[4]  Eduardo Ros Vidal,et al.  Robust Bioinspired Architecture for Optical-Flow Computation , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Clive Temperton Implementation of a self-sorting in-place prime factor FFT algorithm , 1985 .

[6]  Anke Meyer-Bäse,et al.  COordinate Rotation DIgital Computer (CORDIC) Synthesis for FPGA , 1994, FPL.

[7]  Uwe Meyer-Baese Benchmarks for 2D discrete wavelet transforms , 1999, Defense, Security, and Sensing.

[8]  Fred J. Taylor,et al.  Design and Implementation of High-Performance RNS Wavelet Processors Using Custom IC Technologies , 2003, J. VLSI Signal Process..

[9]  H. JoséAntonioMartín,et al.  FPGA-Based Multimodal Embedded Sensor System Integrating Low- and Mid-Level Vision , 2011, Sensors.

[10]  Simon Y. Foo,et al.  A parallel CORDIC architecture dedicated to compute the Gaussian potential function in neural networks , 2003 .

[11]  Francisco Tirado,et al.  Acceleration of block-matching algorithms using a custom instruction-based paradigm on a Nios II microprocessor , 2013, EURASIP J. Adv. Signal Process..

[12]  Fred J. Taylor,et al.  A Fast Modified CORDIC—Implementation of Radial Basis Neural Networks , 1998, J. VLSI Signal Process..

[13]  C. Sidney Burrus,et al.  Prime factor FFT algorithms for real-valued series , 1984, ICASSP.

[14]  R.C. Agarwal,et al.  Number theory in digital signal processing , 1980, Proceedings of the IEEE.

[15]  Francisco Tirado,et al.  A Low Cost Matching Motion Estimation Sensor Based on the NIOS II Microprocessor , 2012, Sensors.

[16]  Guillermo Botella,et al.  Bio-inspired robust optical flow processor system for VLSI implementation , 2009 .

[17]  Uwe Meyer-Baese,et al.  Optimization of high speed pipelining in FPGA-based FIR filter design using genetic algorithm , 2012, Defense + Commercial Sensing.

[18]  Thanos Stouraitis,et al.  New power-of-2 RNS scaling scheme for cell-based IC design , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[19]  Fred J. Taylor,et al.  Fast implementation of orthogonal wavelet filterbanks using field-programmable logic , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[20]  Carlos García,et al.  Hardware implementation of machine vision systems: image and video processing , 2013, EURASIP J. Adv. Signal Process..

[21]  Reiner W. Hartenstein,et al.  Field-Programmable Logic Architectures, Synthesis and Applications , 1994, Lecture Notes in Computer Science.