Learning Deep Structured Models

Many problems in real-world applications involve predicting several random variables that are statistically related. Markov random fields (MRFs) are a great mathematical tool to encode such dependencies. The goal of this paper is to combine MRFs with deep learning to estimate complex representations while taking into account the dependencies between the output random variables. Towards this goal, we propose a training algorithm that is able to learn structured models jointly with deep features that form the MRF potentials. Our approach is efficient as it blends learning and inference and makes use of GPU acceleration. We demonstrate the effectiveness of our algorithm in the tasks of predicting words from noisy images, as well as tagging of Flickr photographs. We show that joint learning of the deep features and the MRF parameters results in significant performance gains.

[1]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[2]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[3]  Paul Tseng,et al.  Relaxation methods for problems with strictly convex separable costs and linear constraints , 1987, Math. Program..

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[5]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[6]  Yoshua Bengio,et al.  Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Tom Heskes,et al.  Fractional Belief Propagation , 2002, NIPS.

[11]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[12]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[13]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[14]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[15]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[16]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[17]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[18]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[19]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[20]  Yair Weiss,et al.  MAP Estimation, Linear Programming and Belief Propagation with Convex Free Energies , 2007, UAI.

[21]  Tommi S. Jaakkola,et al.  Tightening LP Relaxations for MAP using Message Passing , 2008, UAI.

[22]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[23]  Eric Fosler-Lussier,et al.  Conditional Random Fields for Integrating Local Discriminative Classifiers , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[25]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[26]  J. Schmidhuber,et al.  A Novel Connectionist System for Unconstrained Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Amir Globerson,et al.  Convergent message passing algorithms - a unifying view , 2009, UAI.

[28]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[29]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[30]  Jian Peng,et al.  Conditional Neural Fields , 2009, NIPS.

[31]  Tommi S. Jaakkola,et al.  Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[32]  Eric Fosler-Lussier,et al.  Backpropagation training for multilayer conditional random field based phone recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Tamir Hazan,et al.  Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference , 2009, IEEE Transactions on Information Theory.

[34]  Thierry Artières,et al.  Neural conditional random fields , 2010, AISTATS.

[35]  Tamir Hazan,et al.  A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction , 2010, NIPS.

[36]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[37]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[38]  Sebastian Nowozin,et al.  Decision tree fields , 2011, 2011 International Conference on Computer Vision.

[39]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Marc Pollefeys,et al.  Efficient Structured Prediction with Latent Variables for General Graphical Models , 2012, ICML.

[41]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42]  Jian Peng,et al.  A conditional neural fields model for protein threading , 2012, Bioinform..

[43]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[44]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[45]  Sebastian Nowozin,et al.  Regression Tree Fields — An efficient, non-parametric approach to image labeling problems , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Justin Domke,et al.  Structured Learning via Logistic Regression , 2013, NIPS.

[47]  Alexander Gerhard Schwing,et al.  Inference and learning algorithms with applications to 3D indoor scene understanding , 2013 .

[48]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[49]  Sebastian Nowozin,et al.  Learning Convex QP Relaxations for Structured Prediction , 2013, ICML.

[50]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[51]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[52]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[54]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[55]  Jia Xu,et al.  Tell Me What You See and I Will Show You Where It Is , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[57]  Yann LeCun,et al.  Understanding Deep Architectures using a Recursive Convolutional Network , 2013, ICLR.

[58]  Richard S. Zemel,et al.  High Order Regularization for Semi-Supervised Learning of Structured Output Problems , 2014, ICML.

[59]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[60]  Raquel Urtasun,et al.  Fully Connected Deep Structured Networks , 2015, ArXiv.

[61]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).