Fast MSER

Maximally Stable Extremal Regions (MSER) algorithms are based on the component tree and are used to detect invariant regions. OpenCV MSER, the most popular MSER implementation, uses a linked list to associate pixels with ERs. The data-structure of an ER contains the attributes of a head and a tail linked node, which makes OpenCV MSER hard to be performed in parallel using existing parallel component tree strategies. Besides, pixel extraction (i.e. extracting the pixels in MSERs) in OpenCV MSER is very slow. In this paper, we propose two novel MSER algorithms, called Fast MSER V1 and V2. They first divide an image into several spatial partitions, then construct sub-trees and doubly linked lists (for V1) or a labelled image (for V2) on the partitions in parallel. A novel sub-tree merging algorithm is used in V1 to merge the sub-trees into the final tree, and the doubly linked lists are also merged in the process. While V2 merges the sub-trees using an existing merging algorithm. Finally, MSERs are recognized, the pixels in them are extracted through two novel pixel extraction methods taking advantage of the fact that a lot of pixels in parent and child MSERs are duplicated. Both V1 and V2 outperform three open source MSER algorithms (28 and 26 times faster than OpenCV MSER), and reduce the memory of the pixels in MSERs by 78%.

[1]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[2]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Vincent Lepetit,et al.  Learning to Find Good Correspondences , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Horst Bischof,et al.  Efficient Maximally Stable Extremal Region (MSER) Tracking , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[6]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[7]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[8]  Gueesang Lee,et al.  Robust Text Detection in Natural Scene Images , 2016, Australasian Conference on Artificial Intelligence.

[9]  W. James MacLean,et al.  Real-Time Extraction of Maximally Stable Extremal Regions on an FPGA , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[10]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Robert E. Tarjan,et al.  Efficiency of a Good But Not Linear Set Union Algorithm , 1972, JACM.

[12]  Nancy M. Desjardins Thesis Advisor , 2002 .

[13]  Horst Bischof,et al.  3D Segmentation by Maximally Stable Volumes (MSVs) , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[14]  Weilin Huang,et al.  Text-Attentional Convolutional Neural Network for Scene Text Detection , 2015, IEEE Transactions on Image Processing.

[15]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Michael H. F. Wilkinson,et al.  A Hybrid Shared-Memory Parallel Max-Tree Algorithm for Extreme Dynamic-Range Images , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Stepán Obdrzálek,et al.  Object Recognition using Local Affine Frames on Distinguished Regions , 2002, BMVC.

[20]  Torsten Sattler,et al.  Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[22]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  David Nistér,et al.  Linear Time Maximally Stable Extremal Regions , 2008, ECCV.

[24]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[25]  David G. Lowe,et al.  Shape Descriptors for Maximally Stable Extremal Regions , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  R. Manmatha,et al.  Efficient Exploration of Text Regions in Natural Scene Images Using Adaptive Image Sampling , 2016, ECCV Workshops.

[27]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[28]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[29]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[30]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[31]  Philippe Salembier,et al.  Antiextensive connected operators for image and sequence processing , 1998, IEEE Trans. Image Process..

[32]  Andrzej Sluzek,et al.  Improving Performances of MSER Features in Matching and Retrieval Tasks , 2016, ECCV Workshops.

[33]  Daniel Barath,et al.  Five-Point Fundamental Matrix Estimation for Uncalibrated Cameras , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Per-Erik Forssén,et al.  Maximally Stable Colour Regions for Recognition and Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Gordon Wetzstein,et al.  LiFF: Light Field Features in Scale and Depth , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jun Zhang,et al.  Multi-Orientation Scene Text Detection with Adaptive Clustering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[38]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[39]  Michel Couprie,et al.  Building the Component Tree in Quasi-Linear Time , 2006, IEEE Transactions on Image Processing.

[40]  Lei Zhou,et al.  ContextDesc: Local Descriptor Augmentation With Cross-Modality Context , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Hui Gao,et al.  Concurrent Computation of Attribute Filters on Shared Memory Parallel Machines , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Yasuyuki Matsushita,et al.  GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).