Recent advances in features extraction and description algorithms: A comprehensive survey

Computer vision is one of the most active research fields in information technology today. Giving machines and robots the ability to see and comprehend the surrounding world at the speed of sight creates endless potential applications and opportunities. Feature detection and description algorithms can be indeed considered as the retina of the eyes of such machines and robots. However, these algorithms are typically computationally intensive, which prevents them from achieving the speed of sight real-time performance. In addition, they differ in their capabilities and some may favor and work better given a specific type of input compared to others. As such, it is essential to compactly report their pros and cons as well as their performances and recent advances. This paper is dedicated to provide a comprehensive overview on the state-of-the-art and recent advances in feature detection and description algorithms. Specifically, it starts by overviewing fundamental concepts. It then compares, reports and discusses their performance and capabilities. The Maximally Stable Extremal Regions algorithm and the Scale Invariant Feature Transform algorithms, being two of the best of their type, are selected to report their recent algorithmic derivatives.

[1]  David Nistér,et al.  Linear Time Maximally Stable Extremal Regions , 2008, ECCV.

[2]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[3]  Thomas B. Moeslund,et al.  EREL: Extremal regions of extremum levels , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[4]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[5]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[6]  Qi Tian,et al.  A survey of recent advances in visual feature detection , 2015, Neurocomputing.

[7]  Hau T. Ngo,et al.  Real-time video surveillance on an embedded, programmable platform , 2013, Microprocess. Microsystems.

[8]  Krystian Mikolajczyk,et al.  Evaluation of local detectors and descriptors for fast feature matching , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[9]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[10]  Per-Erik Forssén,et al.  Maximally Stable Colour Regions for Recognition and Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Thomas B. Moeslund,et al.  Extremal Regions Detection Guided by Maxima of Gradient Magnitude , 2015, IEEE Transactions on Image Processing.

[12]  Pascal Monasse,et al.  Tree-Based Morse Regions: A Topological Approach to Local Feature Detection , 2014, IEEE Transactions on Image Processing.

[13]  Andrea Vedaldi An Implementation of Multi-Dimensional Maximally Stable Extremal Regions , 2007 .

[14]  Aly A. Farag,et al.  CSIFT: A SIFT Descriptor with Color Invariant Characteristics , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Mohammed Ismail,et al.  Automated real-time video surveillance algorithms for SoC implementation: A survey , 2013, 2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS).

[16]  Andrzej Sluzek,et al.  A maximally stable extremal regions system-on-chip for real-time visual surveillance , 2015, IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society.

[17]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[18]  Rudy Lauwereins,et al.  SIFER: Scale-Invariant Feature Detector with Error Resilience , 2013, International Journal of Computer Vision.

[19]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[20]  Ghassan Hamarneh,et al.  N-Sift: N-Dimensional Scale Invariant Feature Transform for Matching Medical Images , 2007, ISBI.

[21]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Dongbing Gu,et al.  Extracting Semantic Information from Visual Data: A Survey , 2016, Robotics.

[23]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[24]  In Kyu Park,et al.  Performance Evaluation of Local Descriptors for Affine Invariant Region Detector , 2014, ACCV Workshops.

[25]  Horst Bischof,et al.  3D Segmentation by Maximally Stable Volumes (MSVs) , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[26]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[27]  Huijun Di,et al.  A Scalable Distributed Architecture for Intelligent Vision System , 2012, IEEE Transactions on Industrial Informatics.

[28]  Andrzej Sluzek,et al.  A hardware accelerator for real-time extraction of the linear-time MSER algorithm , 2015, IECON 2015 - 41st Annual Conference of the IEEE Industrial Electronics Society.

[29]  Honghai Liu,et al.  Intelligent Video Systems and Analytics: A Survey , 2013, IEEE Transactions on Industrial Informatics.

[30]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[31]  Scott Krig,et al.  Computer Vision Metrics: Survey, Taxonomy, and Analysis , 2014 .