Proximal Deep Structured Models

Many problems in real-world applications involve predicting continuous-valued random variables that are statistically related. In this paper, we propose a powerful deep structured model that is able to learn complex non-linear functions which encode the dependencies between continuous output variables. We show that inference in our model using proximal methods can be efficiently solved as a feed-foward pass of a special type of deep recurrent neural network. We demonstrate the effectiveness of our approach in the tasks of image denoising, depth refinement and optical flow estimation.

[1]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[2]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Stefan Harmeling,et al.  Image denoising: Can plain neural networks compete with BM3D? , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[5]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[6]  Martial Hebert,et al.  Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.

[7]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[9]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[10]  William T. Freeman,et al.  Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology , 1999, Neural Computation.

[11]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Yair Weiss,et al.  From learning models of natural image patches to whole image restoration , 2011, 2011 International Conference on Computer Vision.

[13]  Alan L. Yuille,et al.  Learning Deep Structured Models , 2014, ICML.

[14]  David A. McAllester,et al.  Particle Belief Propagation , 2009, AISTATS.

[15]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[16]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[17]  Raquel Urtasun,et al.  Fully Connected Deep Structured Networks , 2015, ArXiv.

[18]  Michael J. Black,et al.  Fields of Experts: a framework for learning image priors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Sebastian Nowozin,et al.  Cascades of Regression Tree Fields for Image Restoration , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Mathieu Salzmann,et al.  Continuous Inference in Graphical Models with Polynomial Energies , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Donald Geman,et al.  Nonlinear image recovery with half-quadratic regularization , 1995, IEEE Trans. Image Process..

[23]  Stefan Roth,et al.  Shrinkage Fields for Effective Image Restoration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[25]  Rob Fergus,et al.  Fast Image Deconvolution using Hyper-Laplacian Priors , 2009, NIPS.

[26]  Pushmeet Kohli,et al.  A Convex Discrete-Continuous Approach for Markov Random Fields , 2012, ECCV.

[27]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[31]  Justin Domke,et al.  Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[32]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[33]  Wei Yu,et al.  On learning optimized reaction diffusion processes for effective image restoration , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Andrew McCallum,et al.  Structured Prediction Energy Networks , 2015, ICML.

[35]  Michael Isard,et al.  Nonparametric belief propagation , 2010, Commun. ACM.

[36]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[37]  Raquel Urtasun,et al.  Efficient Inference of Continuous Markov Random Fields with Polynomial Potentials , 2014, NIPS.

[38]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[40]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[41]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[42]  Guillermo Sapiro,et al.  Non-local sparse models for image restoration , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[44]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[46]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Antonio Criminisi,et al.  Filter Forests for Learning Data-Dependent Convolutional Kernels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[49]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[50]  Jan Kautz,et al.  PMBP: PatchMatch Belief Propagation for Correspondence Field Estimation , 2014, International Journal of Computer Vision.

[51]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[52]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..