End-to-End Learning for Graph Decomposition

Deep neural networks provide powerful tools for pattern recognition, while classical graph algorithms are widely used to solve combinatorial problems. In computer vision, many tasks combine elements of both pattern recognition and graph reasoning. In this paper, we study how to connect deep networks with graph decomposition into an end-to-end trainable framework. More specifically, the minimum cost multicut problem is first converted to an unconstrained binary cubic formulation where cycle consistency constraints are incorporated into the objective function. The new optimization problem can be viewed as a Conditional Random Field (CRF) in which the random variables are associated with the binary edge labels. Cycle constraints are introduced into the CRF as high-order potentials. A standard Convolutional Neural Network (CNN) provides the front-end features for the fully differentiable CRF. The parameters of both parts are optimized in an end-to-end manner. The efficacy of the proposed learning algorithm is demonstrated via experiments on clustering MNIST images and on the challenging task of real-world multi-people pose estimation.

[1]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[5]  Sebastian Nowozin,et al.  Solution stability in linear programming relaxations: graph partitioning and unsupervised learning , 2009, ICML '09.

[6]  Shuicheng Yan,et al.  Human Pose Estimation with Parsing Induced Learner , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Julian Yarkony,et al.  Accelerating Dynamic Programs via Nested Benders Decomposition with Application to Multi-Person Pose Estimation , 2018, ECCV.

[8]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[11]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Gerhard Reinelt,et al.  Globally Optimal Image Partitioning by Multicuts , 2011, EMMCVPR.

[13]  Bernt Schiele,et al.  Subgraph decomposition for multi-target tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Nikhil Bansal,et al.  Correlation Clustering , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[17]  Fei Yang,et al.  Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Philip H. S. Torr,et al.  Higher Order Conditional Random Fields in Deep Neural Networks , 2015, ECCV.

[19]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[20]  Jonathan Tompson,et al.  PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model , 2018, ECCV.

[21]  Bjoern Andres,et al.  A Message Passing Algorithm for the Minimum Cost Multicut Problem , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiaogang Wang,et al.  CRF-CNN: Modeling Structured Information in Human Pose Estimation , 2016, NIPS.

[23]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[24]  Vibhav Vineet,et al.  Filter-Based Mean-Field Inference for Random Fields with Higher-Order Terms and Product Label-Spaces , 2012, International Journal of Computer Vision.

[25]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Taiki Sekii,et al.  Pose Proposal Networks , 2018, ECCV.

[27]  Julian Yarkony,et al.  Fast Planar Correlation Clustering for Image Segmentation , 2012, ECCV.

[28]  Wongun Choi,et al.  Deep Network Flow for Multi-object Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[30]  Bjoern Andres,et al.  A Comparative Study of Local Search Algorithms for Correlation Clustering , 2017, GCPR.

[31]  Alan L. Yuille,et al.  Learning Deep Structured Models , 2014, ICML.

[32]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[33]  Carsten Rother,et al.  InstanceCut: From Edges to Instances with MultiCut , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Luc Van Gool,et al.  Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  H TorrPhilip,et al.  Filter-Based Mean-Field Inference for Random Fields with Higher-Order Terms and Product Label-Spaces , 2014 .

[37]  Nikos Komodakis,et al.  Beyond pairwise energies: Efficient optimization for higher-order MRFs , 2009, CVPR.

[38]  Gerhard Reinelt,et al.  Higher-order segmentation via multicuts , 2013, Comput. Vis. Image Underst..

[39]  Jianbo Liu,et al.  LSTM Pose Machines , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Chandan Singh,et al.  Large Scale Image Segmentation with Structured Loss Based Deep Learning for Connectome Reconstruction , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[43]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[44]  Thomas Brox,et al.  Efficient Decomposition of Image and Mesh Graphs by Lifted Multicuts , 2015, ICCV.

[45]  Shuicheng Yan,et al.  Pose Partition Networks for Multi-person Pose Estimation , 2018, ECCV.

[46]  Raquel Urtasun,et al.  End-to-end Learning of Multi-sensor 3D Tracking by Detection , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Bernt Schiele,et al.  ArtTrack: Articulated Multi-Person Tracking in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Jörg H. Kappes,et al.  Fusion moves for correlation clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Sebastian Nowozin,et al.  Higher-Order Correlation Clustering for Image Segmentation , 2011, NIPS.

[51]  M. R. Rao,et al.  The partition problem , 1993, Math. Program..

[52]  Cristian Sminchisescu,et al.  Deep Learning of Graph Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .