Logo detection using weakly supervised saliency map

Box level annotation of a large number of logo images for training purpose of typical deep learning architecture is highly challenging. Thus, a method that can detect the logo with the help of training to remove box-level annotations can be helpful. In this paper, we present a method of logo detection that utilizes weakly supervised learning of Convolutional Neural Network (CNN) to generate a deep saliency map. The saliency map is generated from the back-propagated response of the CNN trained with the classification task. The saliency map produces responses for the regions of logos. GrabCut segmentation method has been applied then to obtain the bounding box corresponding to the logo class predicted by the CNN for a given image. AlexNet, CaffeNet, and VGGNet deep architectures has been fine-tuned for the classification purpose. The framework is further utilized for detection through a back-propagated saliency map. The performance of the proposed methodology has been validated on the FlickrLogos-32 logo benchmark dataset. The proposed method outperforms the state-of-the-art baseline fully supervised methods with mean average precision (mAP) of 75.83%.

[1]  Jae Y. Shin,et al.  Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? , 2016, IEEE transactions on medical imaging.

[2]  Daling Wang,et al.  Logo Detection and Recognition Based on Classification , 2014, WAIM.

[3]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shi-Min Hu,et al.  Global Contrast Based Salient Region Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Sheng Tang,et al.  Logo detection based on spatial-spectral saliency and partial spatial context , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[8]  Patrick J. Cohn,et al.  An Exploratory Study on Peak Performance in Golf , 1991 .

[9]  Shaogang Gong,et al.  Scalable logo detection by self co-learning , 2020, Pattern Recognit..

[10]  Yueting Zhuang,et al.  Saliency Detection within a Deep Convolutional Architecture , 2014, AAAI 2014.

[11]  Marc Toussaint,et al.  Multi-class image segmentation using conditional random fields and global classification , 2009, ICML '09.

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Partha Pratim Roy,et al.  Zero Shot Learning Based Script Identification in the Wild , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[14]  Wenyu Liu,et al.  Weakly Supervised Region Proposal Network and Object Detection , 2018, ECCV.

[15]  Lei Zhu,et al.  Online Cross-Modal Hashing for Web Image Retrieval , 2016, AAAI.

[16]  Volkmar Frinken,et al.  Visual Saliency Models for Text Detection in Real World , 2014, PloS one.

[17]  Kannappan Palaniappan,et al.  Multi-class regularization parameter learning for graph cut image segmentation , 2013, 2013 IEEE 10th International Symposium on Biomedical Imaging.

[18]  Subhajit Sanyal,et al.  LogoSeeker: a system for detecting and matching logos in natural images , 2007, ACM Multimedia.

[19]  Josep Lladós,et al.  Logo Spotting by a Bag-of-words Approach for Document Categorization , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[20]  Shaogang Gong,et al.  Deep Learning Logo Detection with Data Expansion by Synthesising Context , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Qiang Wu,et al.  LOGO-Net: Large-scale Deep Logo Detection and Brand Recognition with Deep Region-based Convolutional Networks , 2015, ArXiv.

[22]  Shaogang Gong,et al.  WebLogo-2M: Scalable Logo Detection by Deep Learning from the Web , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[23]  Chinmoy Biswas,et al.  Logo Recognition Technique using Sift Descriptor, Surf Descriptor and Hog Descriptor , 2015 .

[24]  Lei Zhu,et al.  Unsupervised Topic Hypergraph Hashing for Efficient Mobile Image Retrieval , 2017, IEEE Transactions on Cybernetics.

[25]  David A. Clausi,et al.  Statistical Textural Distinctiveness for Salient Region Detection in Natural Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Wenyu Liu,et al.  Multiple Instance Detection Network with Online Instance Classifier Refinement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[28]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30]  Linjie Xing,et al.  Convolutional Character Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Yuxin Peng,et al.  Exploiting distinctive topological constraint of local feature matching for logo image recognition , 2017, Neurocomputing.

[33]  Sander Dieleman,et al.  Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video , 2015, International Journal of Computer Vision.

[34]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[35]  Ruslan Salakhutdinov,et al.  Action Recognition using Visual Attention , 2015, NIPS 2015.

[36]  Nima Tajbakhsh,et al.  Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? , 2016, IEEE Transactions on Medical Imaging.

[37]  Forrest N. Iandola,et al.  DeepLogo: Hitting Logo Recognition with the Deep Neural Network Hammer , 2015, ArXiv.

[38]  Bernt Schiele,et al.  Towards Reaching Human Performance in Pedestrian Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Tuan D. Pham Unconstrained logo detection in document images , 2003, Pattern Recognit..

[40]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Bernard Ghanem,et al.  W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[43]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[44]  Shuvozit Ghose,et al.  A deep one-shot network for query-based logo retrieval , 2018, Pattern Recognit..

[45]  Alireza Alaei,et al.  Logo and seal based administrative document image retrieval: A survey , 2016, Comput. Sci. Rev..

[46]  David Doermann,et al.  Automatic Document Logo Detection , 2007 .

[47]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[48]  Zhe Li,et al.  Fast Logo Detection and Recognition in Document Images , 2010, 2010 20th International Conference on Pattern Recognition.

[49]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[50]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[51]  Olivier Buisson,et al.  Logo retrieval with a contrario visual query expansion , 2009, ACM Multimedia.

[52]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[53]  Rainer Lienhart,et al.  Scalable logo recognition in real-world images , 2011, ICMR.

[54]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[55]  Soo-Hyung Kim,et al.  Unconstrained Object Segmentation Using GrabCut Based on Automatic Generation of Initial Boundary , 2013 .

[56]  Yannis Avrithis,et al.  Scalable triangulation-based logo recognition , 2011, ICMR.

[57]  Yong Dou,et al.  Towards Precise End-to-End Weakly Supervised Object Detection Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[60]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[62]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Gareth Funka-Lea,et al.  Graph Cuts and Efficient N-D Image Segmentation , 2006, International Journal of Computer Vision.

[65]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Florence S. Downs,et al.  Nurture of The Nurse Scholar , 1968 .

[67]  Xing Xie,et al.  Spatial pyramid mining for logo detection in natural scenes , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[68]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.