Learning and Recognition of On-Premise Signs From Weakly Labeled Street View Images

Camera-enabled mobile devices are commonly used as interaction platforms for linking the user's virtual and physical worlds in numerous research and commercial applications, such as serving an augmented reality interface for mobile information retrieval. The various application scenarios give rise to a key technique of daily life visual object recognition. On-premise signs (OPSs), a popular form of commercial advertising, are widely used in our living life. The OPSs often exhibit great visual diversity (e.g., appearing in arbitrary size), accompanied with complex environmental conditions (e.g., foreground and background clutter). Observing that such real-world characteristics are lacking in most of the existing image data sets, in this paper, we first proposed an OPS data set, namely OPS-62, in which totally 4649 OPS images of 62 different businesses are collected from Google's Street View. Further, for addressing the problem of real-world OPS learning and recognition, we developed a probabilistic framework based on the distributional clustering, in which we proposed to exploit the distributional information of each visual feature (the distribution of its associated OPS labels) as a reliable selection criterion for building discriminative OPS models. Experiments on the OPS-62 data set demonstrated the outperformance of our approach over the state-of-the-art probabilistic latent semantic analysis models for more accurate recognitions and less false alarms, with a significant 151.28% relative improvement in the average recognition rate. Meanwhile, our approach is simple, linear, and can be executed in a parallel fashion, making it practical and scalable for large-scale multimedia applications.

[1]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[2]  Trevor Darrell,et al.  Fast concurrent object localization and recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Xing Xie,et al.  Spatial pyramid mining for logo detection in natural scenes , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[4]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Kristen Grauman,et al.  Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds , 2011, CVPR 2011.

[6]  Xiaotie Deng,et al.  Efficient Phrase-Based Document Similarity for Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[7]  Rainer Lienhart,et al.  Scalable logo recognition in real-world images , 2011, ICMR.

[8]  Hyung Jeong Yang,et al.  Automatic detection and recognition of Korean text in outdoor signboard images , 2010, Pattern Recognit. Lett..

[9]  Fei-Fei Li,et al.  OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Nenghai Yu,et al.  Semantics-Preserving Bag-of-Words Models and Applications , 2010, IEEE Transactions on Image Processing.

[12]  Vincent Lepetit,et al.  Gradient Response Maps for Real-Time Detection of Textureless Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Cordelia Schmid,et al.  Correlation-based burstiness for logo retrieval , 2012, ACM Multimedia.

[14]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Olivier Buisson,et al.  Logo retrieval with a contrario visual query expansion , 2009, ACM Multimedia.

[16]  Wen Gao,et al.  PKUBench: A context rich mobile visual search benchmark , 2011, 2011 18th IEEE International Conference on Image Processing.

[17]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[18]  P. K. Chaturvedi,et al.  Communication Systems , 2002, IFIP — The International Federation for Information Processing.

[19]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[21]  Huizhong Chen,et al.  The stanford mobile visual search data set , 2011, MMSys.

[22]  Chong-Wah Ngo,et al.  Scale-Rotation Invariant Pattern Entropy for Keypoint-Based Near-Duplicate Detection , 2009, IEEE Transactions on Image Processing.

[23]  Christos Faloutsos,et al.  Unsupervised modeling of object categories using link analysis techniques , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Yi-Hsuan Yang,et al.  Unsupervised auxiliary visual words discovery for large-scale image object retrieval , 2011, CVPR 2011.

[25]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Tao Xiang,et al.  Weakly supervised object detector learning with model drift detection , 2011, 2011 International Conference on Computer Vision.

[27]  Mubarak Shah,et al.  Street View Challenge: Identification of Commercial Entities in Street View Imagery , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[28]  Bernd Girod,et al.  Linking the virtual and physical worlds ) , 2011 .

[29]  Pietro Perona,et al.  Learning Object Categories From Internet Image Searches , 2010, Proceedings of the IEEE.

[30]  Lei Wang,et al.  Where's the Weet-Bix? , 2007, ACCV.

[31]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[32]  Óscar W. Márquez Flórez,et al.  A Communication Perspective on Automatic Text Categorization , 2009, IEEE Transactions on Knowledge and Data Engineering.

[33]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[35]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[36]  J.-L. Wu,et al.  Video Adaptation for Small Display Based on Content Recomposition , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..