A Low-Power Real-Time SIFT Descriptor Generation Engine for Full-HDTV Video Recognition

SUMMARY This paper describes a SIFT (Scale Invariant Feature Transform) descriptor generation engine which features a VLSI oriented SIFT algorithm, three-stage pipelined architecture and novel systolic array architectures for Gaussian filtering and key-point extraction. The ROIbased scheme has been employed for the VLSI oriented algorithm. The novel systolic array architecture drastically reduces the number of operation cycle and memory access. The cycle counts of Gaussian filtering module is reduced by 82%, compared with the SIMD architecture. The number of memory accesses of the Gaussian filtering module and the keypoint extraction module are reduced by 99.8% and 66% respectively, compared with the results obtained assuming the SIMD architecture. The proposed schemes provide processing capability for HDTV resolution video (1920 × 1080 pixels) at 30 frames per second (fps). The test chip has been fabricated in 65 nm CMOS technology and occupies 4.2 × 4. 2m m 2 containing 1.1M gates and 1.38 Mbit on-chip memory. The measured data demonstrates 38.2 mW power consumption at 78 MHz and 1.2 V.

[1]  Jack E. Volder The CORDIC Trigonometric Computing Technique , 1959, IRE Trans. Electron. Comput..

[2]  Jan-Michael Frahm,et al.  3D model matching with Viewpoint-Invariant Patches (VIP) , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Donghyun Kim,et al.  An 81.6 GOPS Object Recognition Processor Based on NoC and Visual Image Processing Memory , 2007, 2007 IEEE Custom Integrated Circuits Conference.

[4]  George A. Constantinides,et al.  A Parallel Hardware Architecture for Scale and Rotation Invariant Feature Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6]  Masahiko Yoshimoto,et al.  Fast and Low-Memory-Bandwidth Architecture of SIFT Descriptor Generation with Scalability on Speed and Accuracy for VGA Video , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[7]  Wenquan Feng,et al.  An architecture of optimised SIFT feature detection for an FPGA implementation of an image matcher , 2009, 2009 International Conference on Field-Programmable Technology.

[8]  Hoi-Jun Yoo,et al.  Real-Time Object Recognition with Neuro-Fuzzy Controlled Workload-Aware Task Pipelining , 2009, IEEE Micro.

[9]  James J. Little,et al.  Mobile Robot Localization and Mapping with Uncertainty using Scale-Invariant Visual Landmarks , 2002, Int. J. Robotics Res..

[10]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Hoi-Jun Yoo,et al.  A 201.4 GOPS 496 mW Real-Time Multi-Object Recognition Processor With Bio-Inspired Neural Perception Engine , 2009, IEEE Journal of Solid-State Circuits.