Improving Object Classification using Pose Information

We propose a method that exploits pose information in order to improve object classification. A lot of research has focused in other strategies, such as engineering feature extractors, trying different classifiers and even using transfer learning. Here, we use neural network architectures in a multi-task setup, whose outputs predict both the class and the camera azimuth. We investigate both Multi-layer Perceptrons and Convolutional Neural Network architectures, and achieve state-of-the-art results in the challenging NORB dataset.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[3]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[5]  Geoffrey E. Hinton,et al.  3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[6]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[7]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Sven Behnke,et al.  Large-scale object recognition with CUDA-accelerated hierarchical neural networks , 2009, 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[10]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[11]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[12]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[13]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.

[14]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[15]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[16]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[18]  D. Geman,et al.  Stationary Features and Cat Detection , 2008 .

[19]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[20]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[21]  Luca Maria Gambardella,et al.  High-Performance Neural Networks for Visual Object Classification , 2011, ArXiv.

[22]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.