论文信息 - PAID I IMPROVING OBJECT CLASSIFICATION USING POSE INFORMATION

PAID I IMPROVING OBJECT CLASSIFICATION USING POSE INFORMATION

We propose a method that exploits pose information in order to improve object classification. A lot of research has focused in other strategies, such as engineering feature extractors, trying different classifiers and even using transfer learning. Here, we use neural network architectures in a multi-task setup, whose outputs predict both the class and the camera azimuth. We investigate both Multi-layer Perceptrons and Convolutional Neural Network architectures, and achieve stateof-the-art results in the challenging NORB dataset.

David Grangier | Ronan Collobert | F. Fleuret | Hugo Penedones

[1] Mei-Chen Yeh,et al. Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2] D. Geman,et al. Stationary Features and Cat Detection , 2008 .

[3] Antonio Torralba,et al. LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[4] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[5] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7] Sven Behnke,et al. Large-scale object recognition with CUDA-accelerated hierarchical neural networks , 2009, 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[8] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[10] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[12] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[13] Geoffrey E. Hinton,et al. 3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[14] G. Griffin,et al. Caltech-256 Object Category Dataset , 2007 .

[15] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[16] Yann LeCun,et al. Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[17] Luca Maria Gambardella,et al. High-Performance Neural Networks for Visual Object Classification , 2011, ArXiv.

[18] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .

[19] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[20] Yoav Freund,et al. A Short Introduction to Boosting , 1999 .

[21] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[22] Hossein Mobahi,et al. Deep learning from temporal coherence in video , 2009, ICML '09.

[23] Y. LeCun,et al. Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..