DLDR: Deep Linear Discriminative Retrieval for Cultural Event Classification from a Single Image

In this paper we tackle the classification of cultural events from a single image with a deep learning based method. We use convolutional neural networks (CNNs) with VGG-16 architecture [17], pretrained on ImageNet or the Places205 dataset for image classification, and fine-tuned on cultural events data. CNN features are robustly extracted at 4 different layers in each image. At each layer Linear Discriminant Analysis (LDA) is employed for discriminative dimensionality reduction. An image is represented by the concatenated LDA-projected features from all layers or by the concatenation of CNN pooled features at each layer. The classification is then performed through the Iterative Nearest Neighbors-based Classifier (INNC) [20]. Classification scores are obtained for different image representation setups at train and test. The average of the scores is the output of our deep linear discriminative retrieval (DLDR) system. With 0.80 mean average precision (mAP) DLDR is a top entry for the ChaLearn LAP 2015 cultural event recognition challenge.

[1]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Yu Qiao,et al.  Object-Scene Convolutional Neural Networks for event recognition in images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Luc Van Gool,et al.  Iterative Nearest Neighbors for classification and dimensionality reduction , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Jiawei Han,et al.  SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[7]  Luc Van Gool,et al.  Iterative Nearest Neighbors , 2015, Pattern Recognit..

[8]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[9]  Dimitris Samaras,et al.  Recognizing cultural events in images: A study of image categorization models , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[12]  Luc Van Gool,et al.  Sparse Representation Based Projections , 2011, BMVC.

[13]  Limin Wang,et al.  Places205-VGGNet Models for Scene Recognition , 2015, ArXiv.

[14]  Amaia Salvador,et al.  Cultural Event recognition with visual ConvNets and temporal models , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Nojun Kwak,et al.  Cultural event recognition by subregion classification with convolutional neural network , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[18]  Sergio Escalera,et al.  ChaLearn Looking at People 2015 challenges: Action spotting and cultural event recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  J. Friedman Regularized Discriminant Analysis , 1989 .

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[22]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Minh Hoai,et al.  Regularized Max Pooling for Image Categorization , 2014, BMVC.