Learning of Generic Vision Features Using Deep CNN

Eminence of learning algorithm applied for computer vision tasks depends on the features engineered from image. It's premise that different representations can interweave and ensnare most of the elucidative genes that are responsible for variations in images, be it rigid, affine or projective. Hence researches give at most attention in hand-engineering features that capture these variations. But problem is, we need subtle domain knowledge to do that. Thereby making researchers elude epitome of representations. Hence learning algorithms never reach their full potential. In recent times there has been a shift from hand-crafting features to representation learning. The resulting features are not only optimal but also generic as in they can be used as off the shelf features for visual recognition tasks. In this paper we design and experiment with a basic deep convolution neural nets for learning generic vision features with an variant of convolving kernels. They operate by giving importance to individual uncorrelated color channels in a color model by convolving each channel with channel specific kernels. We were able to achieve considerable improvement in performance even when using smaller dataset.

[1]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[2]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[3]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[4]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[5]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[6]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Yann LeCun,et al.  Learning Invariant Feature Hierarchies , 2012, ECCV Workshops.

[8]  William Bialek,et al.  Optimal Manifold Representation of Data: An Information Theoretic Approach , 2003, NIPS.

[9]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Robert D. Nowak,et al.  Unlabeled data: Now it helps, now it doesn't , 2008, NIPS.

[13]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Feature Hierarchies , 2009 .

[14]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[15]  Shengen Yan,et al.  Deep Image: Scaling up Image Recognition , 2015, ArXiv.

[16]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[20]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.