Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid

In modern computer vision, images are typically represented as a fixed uniform grid with some stride and processed via a deep convolutional neural network. We argue that deforming the grid to better align with the high-frequency image content is a more effective strategy. We introduce \emph{Deformable Grid} DefGrid, a learnable neural network module that predicts location offsets of vertices of a 2-dimensional triangular grid, such that the edges of the deformed grid align with image boundaries. We showcase our DefGrid in a variety of use cases, i.e., by inserting it as a module at various levels of processing. We utilize DefGrid as an end-to-end \emph{learnable geometric downsampling} layer that replaces standard pooling methods for reducing feature resolution when feeding images into a deep CNN. We show significantly improved results at the same grid resolution compared to using CNNs on uniform grids for the task of semantic segmentation. We also utilize DefGrid at the output layers for the task of object mask annotation, and show that reasoning about object boundaries on our predicted polygonal grid leads to more accurate results over existing pixel-wise and curve-based approaches. We finally showcase DefGrid as a standalone module for unsupervised image partitioning, showing superior performance over existing approaches. Project website: this http URL

[1]  Gerhard Stephan,et al.  Segmented anisotropic ssTEM dataset of neural tissue , 2013 .

[2]  Sanja Fidler,et al.  Gated-SCNN: Gated Shape CNNs for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Sanja Fidler,et al.  Fast Interactive Object Annotation With Curve-GCN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Pascal Fua,et al.  Free-Shape Polygonal Object Localization , 2014, ECCV.

[7]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[8]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Sanja Fidler,et al.  Real-time coarse-to-fine topologically preserving segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Luc Van Gool,et al.  Deep Extreme Cut: From Extreme Points to Object Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Umar Mohammed,et al.  Superpixel lattices , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[13]  Olga Sorkine-Hornung,et al.  Bounded biharmonic weights for real-time deformation , 2011, Commun. ACM.

[14]  Peter V. Gehler,et al.  Superpixel Convolutional Networks Using Bilateral Inceptions , 2015, ECCV.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Wojciech Matusik,et al.  Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks , 2018, ECCV.

[17]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[19]  Sabine Süsstrunk,et al.  Superpixels and Polygons Using Simple Non-iterative Clustering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Sven J. Dickinson,et al.  TurboPixels: Fast Superpixels Using Geometric Flows , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25]  Sanja Fidler,et al.  Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Leonidas J. Guibas,et al.  DeepPrimitive: Image decomposition by layered primitive detection , 2018, Computational Visual Media.

[27]  Florent Lafarge,et al.  Image partitioning into convex polygons , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yuwen Xiong,et al.  PolyTransform: Deep Polygon Transformer for Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jaakko Lehtinen,et al.  Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[30]  Nicholas Ayache,et al.  A collaborative resource to build consensus for automated left ventricular segmentation of cardiac MR images , 2014, Medical Image Anal..

[31]  Sanja Fidler,et al.  Object Instance Annotation With Deep Extreme Level Set Evolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Santiago Manen,et al.  Online Video SEEDS for Temporal Window Objectness , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Raquel Urtasun,et al.  Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation , 2014, ECCV.

[34]  Luc Van Gool,et al.  Superpixel meshes for fast edge-preserving surface reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Sanja Fidler,et al.  Annotating Object Instances with a Polygon-RNN , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  S. Süsstrunk,et al.  SLIC Superpixels ? , 2010 .

[40]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Rama Chellappa,et al.  Entropy rate superpixel segmentation , 2011, CVPR 2011.

[42]  Jan Kautz,et al.  Learning Superpixels with Segmentation-Aware Affinity Loss , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Leonidas J. Guibas,et al.  DeepSpline: Data-Driven Reconstruction of Parametric Curves and Surfaces , 2019, ArXiv.