论文信息 - Deep Deconvolutional Networks for Scene Parsing

Deep Deconvolutional Networks for Scene Parsing

Scene parsing is an important and challenging prob- lem in computer vision. It requires labeling each pixel in an image with the category it belongs to. Tradition- ally, it has been approached with hand-engineered features from color information in images. Recently convolutional neural networks (CNNs), which automatically learn hierar- chies of features, have achieved record performance on the task. These approaches typically include a post-processing technique, such as superpixels, to produce the final label- ing. In this paper, we propose a novel network architecture that combines deep deconvolutional neural networks with CNNs. Our experiments show that deconvolutional neu- ral networks are capable of learning higher order image structure beyond edge primitives in comparison to CNNs. The new network architecture is employed for multi-patch training, introduced as part of this work. Multi-patch train- ing makes it possible to effectively learn spatial priors from scenes. The proposed approach yields state-of-the-art per- formance on four scene parsing datasets, namely Stanford Background, SIFT Flow, CamVid, and KITTI. In addition, our system has the added advantage of having a training system that can be completely automated end-to-end with- out requiring any post-processing.

Rahul Mohan | R. Mohan

[1] R. Vaillant,et al. Original approach for the localisation of objects in images , 1994 .

[2] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[3] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[4] Thomas Hofmann,et al. Efficient Learning of Sparse Representations with an Energy-Based Model , 2007 .

[5] Marc'Aurelio Ranzato,et al. Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[6] Bill Triggs,et al. Scene Segmentation with CRFs Learned from Partially Labeled Images , 2007, NIPS.

[7] George Loizou,et al. Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[8] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Roberto Cipolla,et al. Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[10] Cordelia Schmid,et al. Object Recognition by Integrating Multiple Image Segmentations , 2008, ECCV.

[11] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[12] Antonio Criminisi,et al. Object Class Segmentation using Random Forests , 2008, BMVC.

[13] Guillermo Sapiro,et al. Supervised Dictionary Learning , 2008, NIPS.

[14] Philip H. S. Torr,et al. Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[15] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[16] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17] R. Fergus,et al. Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Pushmeet Kohli,et al. Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19] Stephen Gould,et al. Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20] Philip H. S. Torr,et al. What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[21] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[22] Martial Hebert,et al. Stacked Hierarchical Labeling , 2010, ECCV.

[23] Svetlana Lazebnik,et al. Superparsing , 2010, International Journal of Computer Vision.

[24] Yann LeCun,et al. Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[25] Charless C. Fowlkes,et al. Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Bastian Leibe,et al. Multi-Class Image Labeling with Top-Down Segmentation and Generalized Robust $P^N$ Potentials , 2011, BMVC.

[27] Andrew Y. Ng,et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[28] Antonio Torralba,et al. SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[30] Graham W. Taylor,et al. Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[31] Yann LeCun,et al. Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers , 2012, ICML.

[32] Franz Kummert,et al. Spatial ray features for real-time ego-lane extraction , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[33] C. V. Jawahar,et al. Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[34] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36] Svetlana Lazebnik,et al. Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Giovani Bernardes Vitor,et al. A probabilistic distribution approach for the classification of urban roads in complex environments , 2014 .

[38] Ronan Collobert,et al. Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[39] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[40] Denis Fernando Wolf,et al. Road terrain detection: Avoiding common obstacle detection assumptions using sensor fusion , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.