CubeLSI: An effective and efficient method for searching resources in social tagging systems

In a social tagging system, resources (such as photos, video and web pages) are associated with tags. These tags allow the resources to be effectively searched through tag-based keyword matching using traditional IR techniques. We note that in many such systems, tags of a resource are often assigned by a diverse audience of causal users (taggers). This leads to two issues that gravely affect the effectiveness of resource retrieval: (1) Noise: tags are picked from an uncontrolled vocabulary and are assigned by untrained taggers. The tags are thus noisy features in resource retrieval. (2) A multitude of aspects: different taggers focus on different aspects of a resource. Representing a resource using a flattened bag of tags ignores this important diversity of taggers. To improve the effectiveness of resource retrieval in social tagging systems, we propose CubeLSI — a technique that extends traditional LSI to include taggers as another dimension of feature space of resources. We compare CubeLSI against a number of other tag-based retrieval models and show that CubeLSI significantly outperforms the other models in terms of retrieval accuracy. We also prove two interesting theorems that allow CubeLSI to be very efficiently computed despite the much enlarged feature space it employs.

[1]  I. C. Mogotsi,et al.  Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to information retrieval , 2010, Information Retrieval.

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[4]  Andreas Hotho,et al.  Tag Recommendations in Folksonomies , 2007, LWA.

[5]  Yong Yu,et al.  Exploring social annotations for the semantic web , 2006, WWW '06.

[6]  Domonkos Tikk,et al.  Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[7]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[8]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[9]  VandewalleJoos,et al.  On the Best Rank-1 and Rank-(R1,R2,. . .,RN) Approximation of Higher-Order Tensors , 2000 .

[10]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[11]  Ja-Ling Wu,et al.  SheepDog: group and tag recommendation for flickr photos by automatic search-based learning , 2008, ACM Multimedia.

[12]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[13]  Jianchang Mao,et al.  Towards the Semantic Web: Collaborative Tag Suggestions , 2006 .

[14]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[15]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[16]  Hui Wan,et al.  Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics , 2007, ICWSM.

[17]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[18]  L. Sauermann,et al.  ConTag : A Semantic Tag Recommendation System , 2007 .

[19]  Knut Magne Risvik,et al.  Multi-tier architecture for Web search engines , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).

[20]  Lars Schmidt-Thieme,et al.  Collaborative Tag Recommendations , 2007, GfKl.

[21]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[22]  Ben Kao,et al.  Collaborative resource discovery in social tagging systems , 2009, CIKM.

[23]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[24]  Susan T. Dumais,et al.  Richard Harshman Indexing by Latent Semantic Analysis , 1990 .

[25]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[26]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[27]  Dominik Benz,et al.  Stop thinking, start tagging: tag semantics emerge from collaborative verbosity , 2010, WWW '10.

[28]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[29]  Ingmar Weber,et al.  Personalized, interactive tag recommendation for flickr , 2008, RecSys '08.

[30]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[31]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[32]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[33]  Tamara G. Kolda,et al.  Scalable Tensor Decompositions for Multi-aspect Data Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[34]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI): TREC-3 Report , 1994, TREC.