TapTell: Interactive visual search for mobile task recommendation

Abstract Mobile devices are becoming ubiquitous. People use them as personal concierge to search information and make decisions. Therefore, understanding user intent and subsequently provide meaningful and personalized suggestions is important. While existing efforts have predominantly focused on understanding the intent expressed by a textual or a voice query, this paper presents a new and alternative perspective which understands user intent visually , i.e., via visual signal captured by the built-in camera. We call this kind of intent “visual intent” as it can be naturally expressed through a visual form. To accomplish the discovery of visual intent on the phone, we develop TapTell , an exemplary real application on Windows Phone seven, by taking advantages of user interaction and rich context to enable interactive visual searches and contextual recommendations. Through the TapTell system, a mobile user can take a photo and indicate an object-of-interest within the photo via different drawing patterns. Then, the system performs a search-based recognition using a proposed large-scale context-embedded vocabulary tree. Finally, contextually relevant entities (i.e., local businesses) are recommended to the user for completing mobile tasks (those tasks which are natural to be raised and subsequently executed when the user utilizes mobile devices). We evaluated TapTell in a variety of scenarios with millions of images, and compared our results to state-of-the-art algorithms for image retrieval.

[1]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[2]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2011, International Journal of Computer Vision.

[3]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[4]  Barry Smyth,et al.  Visual Interfaces for Improved Mobile Search , 2009 .

[5]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[6]  John R. Smith Clicking on Things , 2010, IEEE Multim..

[7]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[9]  Ido Guy,et al.  Will recommenders kill search?: recommender systems - an industry perspective , 2010, RecSys '10.

[10]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Xiaoxin Yin,et al.  Building taxonomy of web search intents for name entity queries , 2010, WWW '10.

[12]  Bernd Girod,et al.  Location coding for mobile image retrieval , 2009, MobiMedia.

[13]  Ramesh C. Jain,et al.  Content without context is meaningless , 2010, ACM Multimedia.

[14]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, CVPR.

[15]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Barry Smyth,et al.  Understanding the intent behind mobile information needs , 2009, IUI.

[17]  Cheng-Hsin Hsu,et al.  Building book inventories using smartphones , 2010, ACM Multimedia.

[18]  Joseph Polifroni,et al.  Bootstrapping Named Entity Extraction for the Creation of Mobile Services , 2010, LREC.

[19]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[21]  Thomas S. Huang,et al.  Human-Computer Interaction , 2011, Visual Analysis of Humans.

[22]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[23]  Xian-Sheng Hua,et al.  Object Retrieval Using Visual Query Context , 2011, IEEE Transactions on Multimedia.

[24]  Tao Mei,et al.  When recommendation meets mobile: contextual and personalized recommendation on the go , 2011, UbiComp '11.

[25]  Anas Al-Nuaimi,et al.  Mobile Visual Location Recognition , 2013 .