Joint Training of Generic CNN-CRF Models with Stochastic Optimization

We propose a new CNN-CRF end-to-end learning framework, which is based on joint stochastic optimization with respect to both Convolutional Neural Network (CNN) and Conditional Random Field (CRF) parameters. While stochastic gradient descent is a standard technique for CNN training, it was not used for joint models so far. We show that our learning method is (i) general, i.e. it applies to arbitrary CNN and CRF architectures and potential functions; (ii) scalable, i.e. it has a low memory footprint and straightforwardly parallelizes on GPUs; (iii) easy in implementation. Additionally, the unified CNN-CRF optimization approach simplifies a potential hardware implementation. We empirically evaluate our method on the task of semantic labeling of body parts in depth images and show that it compares favorably to competing techniques.

[1]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Raquel Urtasun,et al.  Fully Connected Deep Structured Networks , 2015, ArXiv.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Jian Sun,et al.  Global refinement of random forest , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Alan L. Yuille,et al.  Learning Deep Structured Models , 2014, ICML.

[6]  Carsten Rother,et al.  DenseCut: Densely Connected CRFs for Realtime GrabCut , 2015, Comput. Graph. Forum.

[7]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[8]  Justin Domke,et al.  Learning Graphical Model Parameters with Approximate Marginal Inference , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[10]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[12]  Alan L. Yuille,et al.  The Convergence of Contrastive Divergences , 2004, NIPS.

[13]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[14]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[15]  H. Robbins A Stochastic Approximation Method , 1951 .

[16]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[17]  Wayne Ieee,et al.  Entropy Nets: From Decision Trees to Neural Networks , 1990 .

[18]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Peter V. Gehler,et al.  Human Pose Estimation with Fields of Parts , 2014, ECCV.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Andrew Adams,et al.  Fast High‐Dimensional Filtering Using the Permutohedral Lattice , 2010, Comput. Graph. Forum.

[23]  Andrew McCallum,et al.  Piecewise Training for Undirected Models , 2005, UAI.

[24]  R. Zemel,et al.  Multiscale conditional random fields for image labeling , 2004, CVPR 2004.

[25]  Misha Denil,et al.  Consistency of Online Random Forests , 2013, ICML.

[26]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[27]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[28]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[29]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[30]  Adrian Barbu,et al.  Training an Active Random Field for Real-Time Image Denoising , 2009, IEEE Transactions on Image Processing.

[31]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[33]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[34]  Sebastian Nowozin,et al.  Decision tree fields , 2011, 2011 International Conference on Computer Vision.

[35]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[37]  Martial Hebert,et al.  Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.

[38]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[39]  Arthur Gretton,et al.  Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees , 2011, AISTATS.

[40]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[41]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[42]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.