Large-Scale 3D Scene Classification With Multi-View Volumetric CNN

We introduce a method to classify imagery using a convo- lutional neural network (CNN) on multi-view image pro- jections. The power of our method comes from using pro- jections of multiple images at multiple depth planes near the reconstructed surface. This enables classification of categories whose salient aspect is appearance change un- der different viewpoints, such as water, trees, and other materials with complex reflection/light response proper- ties. Our method does not require boundary labelling in images and works on pixel-level classification with a small (few pixels) context, which simplifies the cre- ation of a training set. We demonstrate this application on large-scale aerial imagery collections, and extend the per-pixel classification to robustly create a consistent 2D classification which can be used to fill the gaps in non- reconstructible water regions. We also apply our method to classify tree regions. In both cases, the training data can quickly be generated using a small number of manually- created polygons on a map. We show that even with a very simple and standard network our CNN outperforms the state-of-the-art image classification, the Inception-V3 model retrained from a large collection of aerial images.

[1]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[2]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Trevor Darrell,et al.  Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[8]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[10]  Fei-Fei Li,et al.  What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.

[11]  Bo Han,et al.  TouchCut: Fast image and video segmentation using single-touch interaction , 2014, Comput. Vis. Image Underst..

[12]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[13]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[18]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[20]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).