Annotation for free: video tagging by mining user search behavior

The problem of tagging is mostly considered from the perspectives of machine learning and data-driven philosophy. A fundamental issue that underlies the success of these approaches is the visual similarity, ranging from the nearest neighbor search to manifold learning, to identify similar instances of an example for tag completion. The need to searching for millions of visual examples in high-dimensional feature space, however, makes the task computationally expensive. Moreover, the results can suffer from robustness problem, when the underlying data, such as online videos, are rich of semantics and the similarity is difficult to be learnt from low-level features. This paper studies the exploration of user searching behavior through click-through data, which is largely available and freely accessible by search engines, for learning video relationship and applying the relationship for economic way of annotating online videos. We demonstrated that, by a simple approach using co-click statistics, promising results were obtained in contrast to feature-based similarity measurement. Furthermore, considering the long tail effect that few videos dominate most clicks, a new method based on~polynomial~semantic indexing is proposed to learn a latent space~for alleviating the sparsity problem of click-through data. The proposed approaches are then applied for three major tasks in tagging: tag assignment, ranking, and enrichment. On~a bipartite graph constructed from click-through data with~over 15 million queries and 20 million video URL clicks,~we showed that annotation can be performed for free with competitive performance and minimum computing resource, representing a new and promising paradigm for video tagging in addition to machine learning and data-driven methodologies.

[1]  B. S. Manjunath,et al.  Video Annotation Through Search and Graph Reinforcement Mining , 2010, IEEE Transactions on Multimedia.

[2]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[3]  Filip Radlinski,et al.  Active exploration for learning rankings from clickthrough data , 2007, KDD '07.

[4]  Ben Carterette,et al.  Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks , 2007, NIPS.

[5]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[6]  Yanjun Qi,et al.  Polynomial Semantic Indexing , 2009, NIPS.

[7]  Shih-Fu Chang,et al.  Active Context-Based Concept Fusionwith Partial User Labels , 2006, 2006 International Conference on Image Processing.

[8]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[9]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[10]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[11]  Xiao Li,et al.  Learning query intent from regularized click graphs , 2008, SIGIR '08.

[12]  Michael R. Lyu,et al.  Learning latent semantic relations from clickthrough data for query suggestion , 2008, CIKM '08.

[13]  Luca Chiarandini,et al.  Image ranking based on user browsing behavior , 2012, SIGIR '12.

[14]  Ricardo Baeza-Yates,et al.  Query-sets: using implicit feedback and query patterns to organize web documents , 2008, WWW.

[15]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[16]  Vidit Jain,et al.  Learning to re-rank: query-dependent image re-ranking using click data , 2011, WWW.

[17]  Ryen W. White,et al.  Mining the search trails of surfing crowds: identifying relevant websites from user activity , 2008, WWW.

[18]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[19]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[20]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[21]  Mark Sanderson,et al.  Automatic video tagging using content redundancy , 2009, SIGIR.

[22]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[23]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[24]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[25]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[26]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[27]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[28]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[29]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[30]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[31]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..