论文信息 - Conditional Random Fields as Recurrent Neural Networks

Conditional Random Fields as Recurrent Neural Networks

Pixel-level labelling tasks, such as semantic segmentation, play a central role in image understanding. Recent approaches have attempted to harness the capabilities of deep learning techniques for image recognition to tackle pixel-level labelling tasks. One central issue in this methodology is the limited capacity of deep learning techniques to delineate visual objects. To solve this problem, we introduce a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling. To this end, we formulate Conditional Random Fields with Gaussian pairwise potentials and mean-field approximate inference as Recurrent Neural Networks. This network, called CRF-RNN, is then plugged in as a part of a CNN to obtain a deep network that has desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF modelling with CNNs, making it possible to train the whole deep network end-to-end with the usual back-propagation algorithm, avoiding offline post-processing methods for object delineation. We apply the proposed method to the problem of semantic image segmentation, obtaining top results on the challenging Pascal VOC 2012 segmentation benchmark.

[1] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[2] James A. Anderson,et al. Neurocomputing: Foundations of Research , 1988 .

[3] Michael C. Mozer,et al. A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[4] Yoshua Bengio,et al. Globally Trained Handwritten Word Recognizer Using Spatial Representation, Convolutional Neural Networks, and Hidden Markov Models , 1993, NIPS.

[5] G. Kane. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[6] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[7] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8] Roberto Manduchi,et al. Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[9] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[10] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11] Sekhar Tatikonda,et al. Loopy Belief Propogation and Gibbs Measures , 2002, UAI.

[12] Zhuowen Tu,et al. Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[13] 박현준,et al. Back Propagation , 1995, Artificial Neural Networks.

[14] Antonio Criminisi,et al. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[15] Frédo Durand,et al. A Fast Approximation of the Bilateral Filter Using a Signal Processing Approach , 2006, International Journal of Computer Vision.

[16] Joseph F. Murray,et al. Supervised Learning of Image Restoration with Convolutional Networks , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17] Zhuowen Tu,et al. Auto-context and its application to high-level vision tasks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Roberto Cipolla,et al. Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Pushmeet Kohli,et al. Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20] Adrian Barbu,et al. Training an Active Random Field for Real-Time Image Denoising , 2009, IEEE Transactions on Image Processing.

[21] Jian Peng,et al. Conditional Neural Fields , 2009, NIPS.

[22] Andrew Adams,et al. Fast High‐Dimensional Filtering Using the Permutohedral Lattice , 2010, Comput. Graph. Forum.

[23] Thierry Artières,et al. Neural conditional random fields , 2010, AISTATS.

[24] Veselin Stoyanov,et al. Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[25] Charless C. Fowlkes,et al. Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[27] Martial Hebert,et al. Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.

[28] Subhransu Maji,et al. Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[29] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[30] Jitendra Malik,et al. Semantic segmentation using regions and parts , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32] Tsuhan Chen,et al. Efficient inference for fully-connected CRFs with stationarity , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Gregory Shakhnarovich,et al. Discriminative Re-ranking of Diverse Segmentations , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[36] Justin Domke,et al. Learning Graphical Model Parameters with Approximate Marginal Inference , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Vladlen Koltun,et al. Parameter Learning and Convergent Inference for Dense Random Fields , 2013, ICML.

[38] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[39] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[40] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[41] Ronan Collobert,et al. Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[42] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43] Sanja Fidler,et al. The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44] Jitendra Malik,et al. Simultaneous Detection and Segmentation , 2014, ECCV.

[45] Marc Pollefeys,et al. Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[47] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[48] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[49] Peter V. Gehler,et al. Human Pose Estimation with Fields of Parts , 2014, ECCV.

[50] Jian Dong,et al. Towards Unified Object Detection and Semantic Segmentation , 2014, ECCV.

[51] Marc Pollefeys,et al. Discriminatively Trained Dense Surface Normal Estimation , 2014, ECCV.

[52] Geoffrey Zweig,et al. Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53] Noah Snavely,et al. Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Cristian Sminchisescu,et al. Free-Form Region Description with Second-Order Pooling , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55] Andrew Zisserman,et al. Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.

[56] Jitendra Malik,et al. Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Gregory Shakhnarovich,et al. Feedforward semantic segmentation with zoom-out features , 2014, CVPR.

[58] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[59] Jian Sun,et al. Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60] Raquel Urtasun,et al. Fully Connected Deep Structured Networks , 2015, ArXiv.

[61] Jitendra Malik,et al. Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62] Rob Fergus,et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[63] Alan L. Yuille,et al. Learning Deep Structured Models , 2014, ICML.

[64] George Papandreou,et al. Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[65] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[66] Jian Sun,et al. BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[67] Philip H. S. Torr,et al. Higher Order Conditional Random Fields in Deep Neural Networks , 2015, ECCV.

[68] Guosheng Lin,et al. Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.