Linear Time Maximally Stable Extremal Regions

In this paper we present a new algorithm for computing Maximally Stable Extremal Regions (MSER), as invented by Matas et al. The standard algorithm makes use of a union-find data structure and takes quasi-linear time in the number of pixels. The new algorithm provides exactly identical results in true worst-case linear time. Moreover, the new algorithm uses significantly less memory and has better cache-locality, resulting in faster execution. Our CPU implementation performs twice as fast as a state-of-the-art FPGA implementation based on the standard algorithm. The new algorithm is based on a different computational ordering of the pixels, which is suggested by another immersion analogy than the one corresponding to the standard connected-component algorithm. With the new computational ordering, the pixels considered or visited at any point during computation consist of a single connected component of pixels in the image, resembling a flood-fill that adapts to the grey-level landscape. The computation only needs a priority queue of candidate pixels (the boundary of the single connected component), a single bit image masking visited pixels, and information for as many components as there are grey-levels in the image. This is substantially more compact in practice than the standard algorithm, where a large number of connected components must be considered in parallel. The new algorithm can also generate the component tree of the image in true linear time. The result shows that MSER detection is not tied to the union-find data structure, which may open more possibilities for parallelization.

[1]  Bill Triggs,et al.  Detecting Keypoints with Stable Position, Orientation, and Scale under Illumination Changes , 2004, ECCV.

[2]  Matthew A. Brown,et al.  Invariant Features from Interest Point Groups , 2002, BMVC.

[3]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[4]  Stepán Obdrzálek,et al.  Object Recognition using Local Affine Frames on Distinguished Regions , 2002, BMVC.

[5]  Horst Bischof,et al.  3D Segmentation by Maximally Stable Volumes (MSVs) , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[9]  Robert E. Tarjan,et al.  A Linear-Time Algorithm for a Special Case of Disjoint Set Union , 1985, J. Comput. Syst. Sci..

[10]  Horst Bischof,et al.  Efficient Maximally Stable Extremal Region (MSER) Tracking , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Per-Erik Forssén,et al.  Maximally Stable Colour Regions for Recognition and Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Gilles Bertrand,et al.  Quasi-Linear Algorithms for the Topological Watershed , 2005, Journal of Mathematical Imaging and Vision.

[13]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[14]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[15]  W. James MacLean,et al.  Real-Time Extraction of Maximally Stable Extremal Regions on an FPGA , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[16]  Jos B. T. M. Roerdink,et al.  The Watershed Transform: Definitions, Algorithms and Parallelization Strategies , 2000, Fundam. Informaticae.

[17]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[18]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[19]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[20]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[21]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[23]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..