Real-Time Hierarchical Supervoxel Segmentation via a Minimum Spanning Tree

Supervoxel segmentation algorithm has been applied as a preprocessing step for many vision tasks. However, existing supervoxel segmentation algorithms cannot generate hierarchical supervoxel segmentation well preserving the spatiotemporal boundaries in real time, which prevents the downstream applications from accurate and efficient processing. In this paper, we propose a real-time hierarchical supervoxel segmentation algorithm based on the minimum spanning tree (MST), which achieves state-of-the-art accuracy meanwhile at least <inline-formula> <tex-math notation="LaTeX">$11\times $ </tex-math></inline-formula> faster than existing methods. In particular, we present a dynamic graph updating operation into the iterative construction process of the MST, which can geometrically decrease the numbers of vertices and edges. In this way, the proposed method is able to generate arbitrary scales of supervoxels on the fly. We prove the efficiency of our algorithm that can produce hierarchical supervoxels in the time complexity of <inline-formula> <tex-math notation="LaTeX">$O(n)$ </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> denotes the number of voxels in the input video. Quantitative and qualitative evaluations on public benchmarks demonstrate that our proposed algorithm significantly outperforms the state-of-the-art algorithms in terms of supervoxel segmentation accuracy and computational efficiency. Furthermore, we demonstrate the effectiveness of the proposed method on a downstream application of video object segmentation.

[1]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Sven J. Dickinson,et al.  Optimal Image and Video Closure by Superpixel Grouping , 2012, International Journal of Computer Vision.

[3]  Rama Chellappa,et al.  Entropy rate superpixel segmentation , 2011, CVPR 2011.

[4]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ran Xu,et al.  Human action segmentation with hierarchical supervoxel consistency , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yihong Gong,et al.  Superpixel Hierarchy , 2016, IEEE Transactions on Image Processing.

[7]  Bodo Rosenhahn,et al.  Temporally Consistent Superpixels , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Frédo Durand,et al.  A Topological Approach to Hierarchical Segmentation using Mean Shift , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Cordelia Schmid,et al.  Spatio-temporal Object Detection Proposals , 2014, ECCV.

[10]  Chenliang Xu,et al.  Weakly Supervised Actor-Action Segmentation via Robust Multi-task Ranking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  James M. Rehg,et al.  Video Segmentation by Tracking Many Figure-Ground Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Thomas Brox,et al.  A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Sven J. Dickinson,et al.  TurboPixels: Fast Superpixels Using Geometric Flows , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Daniel DeMenthon,et al.  SPATIO-TEMPORAL SEGMENTATION OF VIDEO BY HIERARCHICAL MEAN SHIFT ANALYSIS , 2002 .

[15]  Jason J. Corso,et al.  Propagating multi-class pixel labels throughout video frames , 2010, 2010 Western New York Image Processing Workshop.

[16]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[17]  Jason J. Corso,et al.  Video Object Segmentation using Supervoxel-Based Gerrymandering , 2017, ArXiv.

[18]  Chenliang Xu,et al.  LIBSVX: A Supervoxel Library and Benchmark for Early Video Processing , 2015, International Journal of Computer Vision.

[19]  Chenliang Xu,et al.  Evaluation of super-voxel methods for early video processing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Narendra Ahuja,et al.  Exploiting nonlocal spatiotemporal structure for video segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Zhong Chen,et al.  Supervoxel Segmentation with Voxel-Related Gaussian Mixture Model , 2018, Sensors.

[22]  Umar Mohammed,et al.  Superpixel lattices , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Chenliang Xu,et al.  Flattening Supervoxel Hierarchies by the Uniform Entropy Slice , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Ivan Laptev,et al.  Track to the future: Spatio-temporal video segmentation with long-range motion cues , 2011, CVPR 2011.

[26]  Jitendra Malik,et al.  Efficient spatiotemporal grouping using the Nystrom method , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  Jitendra Malik,et al.  Occlusion boundary detection and figure/ground assignment from optical flow , 2011, CVPR 2011.

[28]  Kristen Grauman,et al.  Supervoxel-Consistent Foreground Propagation in Video , 2014, ECCV.

[29]  Bernt Schiele,et al.  Video Segmentation with Superpixels , 2012, ACCV.

[30]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[31]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  David A. Bader,et al.  Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[33]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Chenliang Xu,et al.  Streaming Hierarchical Video Segmentation , 2012, ECCV.

[35]  Xuelong Li,et al.  Video Supervoxels Using Partially Absorbing Random Walks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Hongliang Li,et al.  Object Segmentation from Long Video Sequences , 2015, ACM Multimedia.

[37]  Yong-Jin Liu,et al.  Content-Sensitive Supervoxels via Uniform Tessellations on Video Manifolds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Ullrich Köthe,et al.  3D segmentation of SBFSEM images of neuropil by a graphical model over supervoxel boundaries , 2012, Medical Image Anal..

[39]  Kathryn Fraughnaugh,et al.  Introduction to graph theory , 1973, Mathematical Gazette.

[40]  Chenliang Xu,et al.  Actor-Action Semantic Segmentation with Grouping Process Models , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Santiago Manen,et al.  Online Video SEEDS for Temporal Window Objectness , 2013, 2013 IEEE International Conference on Computer Vision.

[42]  Eric L. Miller,et al.  Multiple Hypothesis Video Segmentation from Superpixel Flows , 2010, ECCV.

[43]  Zheng Pei,et al.  Boundary-Aware Superpixel Segmentation Based on Minimum Spanning Tree , 2018, IEICE Trans. Inf. Syst..

[44]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[45]  Ran Yi,et al.  Feature-Aware Uniform Tessellations on Video Manifold for Content-Sensitive Supervoxels , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Ronen Basri,et al.  Fast multiscale image segmentation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[47]  Luc Van Gool,et al.  SEEDS: Superpixels Extracted Via Energy-Driven Sampling , 2012, International Journal of Computer Vision.

[48]  William Brendel,et al.  Video object segmentation by tracking regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[49]  Bodo Rosenhahn,et al.  Occlusion-Aware Method for Temporally Consistent Superpixels , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Martin Mareš Two linear time algorithms for MST on minor closed graph classes , 2002 .

[51]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[52]  Alan L. Yuille,et al.  Efficient Multilevel Brain Tumor Segmentation With Integrated Bayesian Model Classification , 2008, IEEE Transactions on Medical Imaging.

[53]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .