Large Scale Business Discovery from Street Level Imagery

Search with local intent is becoming increasingly useful due to the popularity of the mobile device. The creation and maintenance of accurate listings of local businesses worldwide is time consuming and expensive. In this paper, we propose an approach to automatically discover businesses that are visible on street level imagery. Precise business store front detection enables accurate geo-location of businesses, and further provides input for business categorization, listing generation, etc. The large variety of business categories in different countries makes this a very challenging problem. Moreover, manual annotation is prohibitive due to the scale of this problem. We propose the use of a MultiBox based approach that takes input image pixels and directly outputs store front bounding boxes. This end-to-end learning approach instead preempts the need for hand modeling either the proposal generation phase or the post-processing phase, leveraging large labelled training datasets. We demonstrate our approach outperforms the state of the art detection techniques with a large margin in terms of performance and run-time efficiency. In the evaluation, we show this approach achieves human accuracy in the low-recall settings. We also provide an end-to-end evaluation of business discovery in the real world.

[1]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[2]  Gary Carpenter 동적 사용자를 위한 Scalable 인증 그룹 키 교환 프로토콜 , 2005 .

[3]  Luc Van Gool,et al.  3D Urban Scene Modeling Integrating Recognition and Reconstruction , 2008, International Journal of Computer Vision.

[4]  Jana Kosecka,et al.  Piecewise planar city 3D modeling from street view panoramic sequences , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[6]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[7]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[9]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[10]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[11]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[12]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[13]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[18]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Neural Networks , 2013 .

[19]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[20]  Yong Jae Lee,et al.  Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[22]  Jana Kosecka,et al.  Piecewise planar city 3D modeling from street view panoramic sequences , 2009, CVPR.

[23]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[24]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[25]  Luc Vincent,et al.  Taking Online Maps Down to Street Level , 2007, Computer.

[26]  Jianxiong Xiao,et al.  Multiple view semantic segmentation for street view images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[28]  Yair Movshovitz-Attias,et al.  Ontological supervision for fine grained classification of Street View storefronts , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Christian Früh,et al.  Google Street View: Capturing the World at Street Level , 2010, Computer.

[30]  Dumitru Erhan,et al.  Scalable, High-Quality Object Detection , 2014, ArXiv.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.