DeepBag: Recognizing Handbag Models

In this paper, we address the problem of branded handbag recognition. It is a challenging problem due to the non-rigid deformation, illumination changes, and inter-class similarity. We propose a novel framework based on deep convolutional neural network (CNN). Concretely, we propose a new CNN model, called feature selective joint classification - regression CNN (FSCR-CNN). Its advantages lie in two folds: 1) it alleviates the illumination changes by a feature selection strategy to focus on the color- nondiscriminative features in the network learning, and 2) rather than only targeting on the hard label (i.e., the handbag model), it also incorporates a soft label (i.e., a distribution measuring the similarity between the ground truth model and all the models to be trained) to construct the loss function for training CNN, which leads to a better classifier for handbags with large inter-class similarity. We evaluate the performance of our framework on a newly built branded handbag dataset. The results show that it performs favorably for recognizing handbags with 94.48% in accuracy. We also apply the proposed FSCR-CNN model in recognizing other fine-grained objects with state-of-the-art CNN architectures, which is able to achieve over 5% improvement in accuracy.

[1]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[2]  Jing Liu,et al.  Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[3]  Gang Wang,et al.  Learning Discriminative and Shareable Features for Scene Classification , 2014, ECCV.

[4]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[6]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[7]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8]  Linda G. Shapiro,et al.  Unsupervised Template Learning for Fine-Grained Object Recognition , 2012, NIPS.

[9]  Yan Wang,et al.  Complementary feature extraction for branded handbag recognition , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[10]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[11]  Xin Geng,et al.  Label Distribution Learning , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[12]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[13]  Arnold W. M. Smeulders,et al.  Fine-Grained Categorization by Alignments , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[15]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification , 2014, ArXiv.

[16]  Shuicheng Yan,et al.  "Wow! you are so beautiful today!" , 2013, MM '13.

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[19]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jing Liu,et al.  Robust Structured Subspace Learning for Data Representation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Changsheng Xu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Tianbao Yang,et al.  Object-centric Sampling for Fine-grained Image Classification , 2014, ArXiv.

[23]  Andrew Zisserman,et al.  Symbiotic Segmentation and Part Localization for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Jun Wang,et al.  Which Looks Like Which: Exploring Inter-class Relationships in Fine-Grained Visual Categorization , 2014, ECCV.

[25]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[26]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Shenghuo Zhu,et al.  Efficient Object Detection and Segmentation for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Changsheng Xu,et al.  Hi, magic closet, tell me what to wear! , 2012, ACM Multimedia.

[29]  Shuicheng Yan,et al.  Fashion Parsing With Weak Color-Category Labels , 2014, IEEE Transactions on Multimedia.

[30]  Atsuto Maki,et al.  From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Gang Wang,et al.  Exemplar based Deep Discriminative and Shareable Feature Learning for scene image classification , 2015, Pattern Recognit..

[32]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Vincent Lepetit,et al.  Multiscale Centerline Detection by Learning a Scale-Space Distance Transform , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[35]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[36]  Hsuan-Tien Lin,et al.  Unsupervised Semantic Feature Discovery for Image Object Retrieval and Tag Refinement , 2012, IEEE Transactions on Multimedia.

[37]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[38]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[39]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[40]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[42]  Sunita Sarawagi,et al.  Scaling multi-class support vector machines using inter-class confusion , 2002, KDD.

[43]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[44]  Baining Guo,et al.  Exemplar-Based Human Action Pose Correction , 2014, IEEE Transactions on Cybernetics.

[45]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[46]  Donald Geman,et al.  Confidence Sets for Fine-Grained Categorization and Plant Species Identification , 2015, International Journal of Computer Vision.

[47]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[48]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Yan Wang,et al.  DeepContour: A deep convolutional feature learned by positive-sharing loss for contour detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Yan Wang,et al.  Category-Separating Strategy for branded handbag recognition , 2014, 2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[51]  Tao Chen,et al.  Discriminative Soft Bag-of-Visual Phrase for Mobile Landmark Recognition , 2014, IEEE Transactions on Multimedia.

[52]  Pietro Perona,et al.  Improved Bird Species Recognition Using Pose Normalized Deep Convolutional Nets , 2014, BMVC.

[53]  Jonathan Krause,et al.  Learning Features and Parts for Fine-Grained Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[54]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[55]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Wen Gao,et al.  Learning to Distribute Vocabulary Indexing for Scalable Visual Search , 2013, IEEE Transactions on Multimedia.

[57]  Jonathan Krause,et al.  Fine-Grained Crowdsourcing for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[59]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[60]  Tao Mei,et al.  Socialized Mobile Photography: Learning to Photograph With Social Context via Mobile Devices , 2014, IEEE Transactions on Multimedia.

[61]  Cordelia Schmid,et al.  Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[62]  Benoit Huet,et al.  When textual and visual information join forces for multimedia retrieval , 2014, ICMR.

[63]  Jiebo Luo,et al.  Snap n' shop: Visual search-based mobile shopping made a breeze by machine and crowd intelligence , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).