论文信息 - DAG-Recurrent Neural Networks for Scene Labeling

DAG-Recurrent Neural Networks for Scene Labeling

In image labeling, local representations for image units are usually generated from their surrounding image patches, thus long-range contextual information is not effectively encoded. In this paper, we introduce recurrent neural networks (RNNs) to address this issue. Specifically, directed acyclic graph RNNs (DAG-RNNs) are proposed to process DAG-structured images, which enables the network to model long-range semantic dependencies among image units. Our DAG-RNNs are capable of tremendously enhancing the discriminative power of local representations, which significantly benefits the local classification. Meanwhile, we propose a novel class weighting function that attends to rare classes, which phenomenally boosts the recognition accuracy for non-frequent classes. Integrating with convolution and deconvolution layers, our DAG-RNNs achieve new state-of-the-art results on the challenging SiftFlow, CamVid and Barcelona benchmarks.

[1] Roberto Cipolla,et al. Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[2] Pushmeet Kohli,et al. Non-parametric Higher-Order Random Fields for Image Segmentation , 2014, ECCV.

[3] J. Schmidhuber,et al. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS 2008.

[4] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Gang Wang,et al. Quaddirectional 2D-Recurrent Neural Networks For Image Labeling , 2015, IEEE Signal Processing Letters.

[6] Jana Kosecka,et al. Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[8] Sebastian Fischer,et al. Exploring Artificial Intelligence In The New Millennium , 2016 .

[9] Miguel Á. Carreira-Perpiñán,et al. Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10] Vladlen Koltun,et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[11] Peter Kontschieder,et al. Neural Decision Forests for Semantic Image Labelling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Claire Cardie,et al. Opinion Mining with Deep Recurrent Neural Networks , 2014, EMNLP.

[13] Razvan Pascanu,et al. Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14] Hongyu Guo,et al. Long Short-Term Memory Over Tree Structures , 2015, ArXiv.

[15] Peter Kontschieder,et al. Structured class-labels in random forests for semantic image labelling , 2011, 2011 International Conference on Computer Vision.

[16] Tsuhan Chen,et al. Efficient inference for fully-connected CRFs with stationarity , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[18] Jürgen Schmidhuber,et al. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[19] Gang Wang,et al. Scene Parsing With Integration of Parametric and Non-Parametric Models , 2016, IEEE Transactions on Image Processing.

[20] Antonio Criminisi,et al. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[21] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[22] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[23] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Ruigang Yang,et al. Semantic Segmentation of Urban Scenes Using Dense Depth Maps , 2010, ECCV.

[25] Martin J. Wainwright,et al. A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[26] Svetlana Lazebnik,et al. Superparsing , 2010, International Journal of Computer Vision.

[27] Gang Wang,et al. Convolutional recurrent neural networks: Learning spatial dependencies for image representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28] Svetlana Lazebnik,et al. Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29] William T. Freeman,et al. Understanding belief propagation and its generalizations , 2003 .

[30] Marcus Liwicki,et al. Scene labeling with LSTM recurrent neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Gang Wang,et al. Integrating parametric and non-parametric models for scene labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34] C. V. Jawahar,et al. Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[35] Pushmeet Kohli,et al. Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[38] Ming-Hsuan Yang,et al. Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Ronan Collobert,et al. Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[40] Ming-Yu Liu,et al. Recursive Context Propagation Network for Semantic Scene Labeling , 2014, NIPS.

[41] Philip H. S. Torr,et al. Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[42] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[43] Xuming He,et al. Superpixel Graph Label Transfer with Learned Distance Metric , 2014, ECCV.

[44] Sinisa Todorovic,et al. Scene Labeling Using Beam Search under Mutex Constraints , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Joost van de Weijer,et al. Unrolling Loopy Top-Down Semantic Feedback in Convolutional Deep Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[46] Gregory Shakhnarovich,et al. Feedforward semantic segmentation with zoom-out features , 2014, CVPR.

[47] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[48] LeCunYann,et al. Learning Hierarchical Features for Scene Labeling , 2013 .

[49] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[50] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51] Vittorio Ferrari,et al. Joint Calibration for Semantic Segmentation , 2015, BMVC.

[52] Antonio Torralba,et al. Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[53] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[54] Philip H. S. Torr,et al. What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[55] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56] Zhuowen Tu,et al. Fixed-Point Model For Structured Labeling , 2013, ICML.

[57] Zhuowen Tu,et al. Auto-context and its application to high-level vision tasks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.