Sparse Coding on Stereo Video for Object Detection

Deep Convolutional Neural Networks (DCNN) require millions of labeled training examples for image classification and object detection tasks, which restrict these models to domains where such datasets are available. In this paper, we explore the use of unsupervised sparse coding applied to stereo-video data to help alleviate the need for large amounts of labeled data. We show that replacing a typical supervised convolutional layer with an unsupervised sparse-coding layer within a DCNN allows for better performance on a car detection task when only a limited number of labeled training examples is available. Furthermore, the network that incorporates sparse coding allows for more consistent performance over varying initializations and ordering of training examples when compared to a fully supervised DCNN. Finally, we compare activations between the unsupervised sparse-coding layer and the supervised convolutional layer, and show that the sparse representation exhibits an encoding that is depth selective, whereas encodings from the convolutional layer do not exhibit such selectivity. These result indicates promise for using unsupervised sparse-coding approaches in real-world computer vision tasks in domains with limited labeled training data.

[1]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Garrett T. Kenyon,et al.  Sparse encoding of binocular images for depth inference , 2016, 2016 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI).

[4]  R. Memisevic,et al.  Stereopsis via deep learning , 2013 .

[5]  Richard G. Baraniuk,et al.  Locally Competitive Algorithms for Sparse Approximation , 2007, 2007 IEEE International Conference on Image Processing.

[6]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[11]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[12]  P O Hoyer,et al.  Independent component analysis applied to feature extraction from colour and stereo images , 2000, Network.

[13]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Zhengya Zhang,et al.  A Sparse Coding Neural Network ASIC With On-Chip Learning for Feature Extraction and Encoding , 2015, IEEE Journal of Solid-State Circuits.

[19]  Bruno A. Olshausen,et al.  Sparse Coding Of Time-Varying Natural Images , 2010 .

[20]  S. H. Ferris Motion parallax and absolute distance. , 1972, Journal of experimental psychology.

[21]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[22]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[23]  N. Qian Binocular Disparity and the Perception of Depth , 1997, Neuron.

[24]  Christopher J. Rozell,et al.  Visual Nonclassical Receptive Field Effects Emerge from Sparse Coding in a Dynamical System , 2013, PLoS Comput. Biol..

[25]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.