Instant Mobile Video Search With Layered Audio-Video Indexing and Progressive Transmission

The proliferation of mobile devices is producing a new wave of applications that enable users to sense their surroundings with smart phones. People are preferring mobile devices to search and browse video content on the move. In this paper, we have developed an innovative mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching. Different than most existing mobile video search applications, the proposed system is aiming at instant and progressive video search by leveraging the light-weight computing capacity of mobile devices. In particular, the system is able to index large-scale video data using a new layered audio-video indexing approach in the cloud, as well as generate lightweight joint audio-video signatures with progressive transmission and perform progressive search on mobile devices. Furthermore, we showcase that the system can be applied to two novel applications-video entity search and video clip localization. The evaluations on the real-world mobile video query dataset show that our system significantly improves user's search experience due to search accuracy, low retrieval latency, and very short recording duration.

[1]  D. Lowe,et al.  Fast Matching of Binary Features , 2012, 2012 Ninth Conference on Computer and Robot Vision.

[2]  Sheng Tang,et al.  Data driven multi-index hashing , 2013, 2013 IEEE International Conference on Image Processing.

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Zi Huang,et al.  Near-duplicate video retrieval: Current research and future trends , 2013, CSUR.

[5]  Sheng Tang,et al.  Efficient Feature Detection and Effective Post-Verification for Large Scale Near-Duplicate Image Search , 2011, IEEE Transactions on Multimedia.

[6]  Tao Mei,et al.  Local visual words coding for low bit rate mobile visual search , 2012, ACM Multimedia.

[7]  Shih-Fu Chang,et al.  Mobile product search with Bag of Hash Bits and boundary reranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  David J. Fleet,et al.  Fast search in Hamming space with multi-index hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Wen Gao,et al.  Towards low bit rate mobile visual search with multiple-channel coding , 2011, ACM Multimedia.

[10]  Yan-Ying Chen,et al.  Enabling low bitrate mobile visual recognition: a performance versus bandwidth evaluation , 2013, MM '13.

[11]  Mirco Musolesi,et al.  Sensing meets mobile social networks: the design, implementation and evaluation of the CenceMe application , 2008, SenSys '08.

[12]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[13]  K. K. More,et al.  Interactive Multimodal Visual Search on Mobile Device , 2015 .

[14]  Changsheng Xu,et al.  Interaction Design for Mobile Visual Search , 2013, IEEE Transactions on Multimedia.

[15]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[16]  Hung-Khoon Tan,et al.  Video hyperlinking: libraries and tools for threading and visualizing large video collection , 2012, ACM Multimedia.

[17]  Bernd Girod,et al.  Low latency image retrieval with progressive transmission of CHoG descriptors , 2010, MCMC '10.

[18]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[19]  Xin Yang,et al.  Accelerating SURF detector on mobile devices , 2012, ACM Multimedia.

[20]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[21]  Changsheng Xu,et al.  Audio-visual large-scale video copy detection , 2011, Int. J. Comput. Math..

[22]  Tao Mei,et al.  Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing , 2012, ACM Multimedia.

[23]  Bernd Girod,et al.  Compressed Histogram of Gradients: A Low-Bitrate Descriptor , 2011, International Journal of Computer Vision.

[24]  Winston H. Hsu,et al.  Sketch-based image retrieval on mobile devices using compact hash bits , 2012, ACM Multimedia.

[25]  Zi Huang,et al.  Effective and Efficient Query Processing for Video Subsequence Identification , 2009, IEEE Transactions on Knowledge and Data Engineering.

[26]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[27]  Yongdong Zhang,et al.  Scalable Similarity Search With Topology Preserving Hashing , 2014, IEEE Transactions on Image Processing.

[28]  Lexing Xie,et al.  Scalable Mobile Video Retrieval with Sparse Projection Learning and Pseudo Label Mining , 2013, IEEE MultiMedia.

[29]  Yongdong Zhang,et al.  A Prior-Free Weighting Scheme for Binary Code Ranking , 2014, IEEE Transactions on Multimedia.

[30]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[32]  Qi Tian,et al.  Multimedia search reranking: A literature survey , 2014, CSUR.

[33]  Ke Gao,et al.  Geometric context-preserving progressive transmission in mobile visual search , 2012, ACM Multimedia.

[34]  Bernd Girod,et al.  Dynamic selection of a feature-rich query frame for mobile video retrieval , 2010, 2010 IEEE International Conference on Image Processing.

[35]  Yongdong Zhang,et al.  Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing , 2013, ACM Multimedia.

[36]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[37]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.