An Embedded System-on-Chip Architecture for Real-time Visual Detection and Matching

Detecting and matching image features is a fundamental task in video analytics and computer vision systems. It establishes the correspondences between two images taken at different time instants or from different viewpoints. However, its large computational complexity has been a challenge to most embedded systems. This paper proposes a new FPGA-based embedded system architecture for feature detection and matching. It consists of scale-invariant feature transform (SIFT) feature detection, as well as binary robust independent elementary features (BRIEF) feature description and matching. It is able to establish accurate correspondences between consecutive frames for 720-p (1280x720) video. It optimizes the FPGA architecture for the SIFT feature detection to reduce the utilization of FPGA resources. Moreover, it implements the BRIEF feature description and matching on FPGA. Due to these contributions, the proposed system achieves feature detection and matching at 60 frame/s for 720-p video. Its processing speed can meet and even exceed the demand of most real-life real-time video analytics applications. Extensive experiments have demonstrated its efficiency and effectiveness.

[1]  Yakup Genc,et al.  GPU-based Video Feature Tracking And Matching , 2006 .

[2]  Thomas Wiegand,et al.  SIFT Implementation and Optimization for General-Purpose GPU , 2007 .

[3]  Vincent Lepetit,et al.  BRIEF: Computing a Local Binary Descriptor Very Fast , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Gang Hua,et al.  Picking the best DAISY , 2009, CVPR.

[5]  Kurt Konolige,et al.  CenSurE: Center Surround Extremas for Realtime Feature Detection and Matching , 2008, ECCV.

[6]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  George A. Constantinides,et al.  A Parallel Hardware Architecture for Scale and Rotation Invariant Feature Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Quan Wang,et al.  Real-Time Image Matching Based on Multiple View Kernel Projection , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Shintaro Izumi,et al.  A Low-Power Real-Time SIFT Descriptor Generation Engine for Full-HDTV Video Recognition , 2011, IEICE Trans. Electron..

[11]  Libor Preucil,et al.  FPGA based Speeded Up Robust Features , 2009, 2009 IEEE International Conference on Technologies for Practical Robot Applications.

[12]  Gundolf Kiefer,et al.  Object Recognition on a Chip: A Complete SURF-Based System on a Single FPGA , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.

[13]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[14]  Vincent Lepetit,et al.  Compact signatures for high-speed interest point description and matching , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Luc Van Gool,et al.  Fast scale invariant feature detection and matching on programmable graphics hardware , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Yung-Chang Chen,et al.  High-Performance SIFT Hardware Accelerator for Real-Time Image Feature Extraction , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[20]  Zhiguo Cao,et al.  A real-time embedded architecture for SIFT , 2013, J. Syst. Archit..

[21]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[22]  Wenquan Feng,et al.  An architecture of optimised SIFT feature detection for an FPGA implementation of an image matcher , 2009, 2009 International Conference on Field-Programmable Technology.

[23]  Saleh Zein-Sabatto,et al.  Image registration for sequence of visual images captured by UAV , 2009, 2009 IEEE Symposium on Computational Intelligence for Multimedia Signal and Vision Processing.

[24]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .