Automatic tag recommendation algorithms for social recommender systems

The emergence of Web 2.0 and the consequent success of social network Web sites such as Del.icio.us and Flickr introduce us to a new concept called social bookmarking, or tagging. Tagging is the action of connecting a relevant user-defined keyword to a document, image, or video, which helps the user to better organize and share their collections of interesting stuff. With the rapid growth of Web 2.0, tagged data is becoming more and more abundant on the social network Web sites. An interesting problem is how to automate the process of making tag recommendations to users when a new resource becomes available. In this article, we address the issue of tag recommendation from a machine learning perspective. From our empirical observation of two large-scale datasets, we first argue that the user-centered approach for tag recommendation is not very effective in practice. Consequently, we propose two novel document-centered approaches that are capable of making effective and efficient tag recommendations in real scenarios. The first, graph-based, method represents the tagged data in two bipartite graphs, (document, tag) and (document, word), then finds document topics by leveraging graph partitioning algorithms. The second, prototype-based, method aims at finding the most representative documents within the data collections and advocates a sparse multiclass Gaussian process classifier for efficient document classification. For both methods, tags are ranked within each topic cluster/class by a novel ranking method. Recommendations are performed by first classifying a new document into one or more topic clusters/classes, and then selecting the most relevant tags from those clusters/classes as machine-recommended tags. Experiments on real-world data from Del.icio.us, CiteULike, and BibSonomy examine the quality of tag recommendation as well as the efficiency of our recommendation algorithms. The results suggest that our document-centered models can substantially improve the performance of tag recommendations when compared to the user-centered methods, as well as topic models LDA and SVM classifiers.

[1]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[2]  Chris H. Q. Ding,et al.  Bipartite graph partitioning and data clustering , 2001, CIKM '01.

[3]  Tong Zhang,et al.  On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning , 2007, J. Mach. Learn. Res..

[4]  Yang Song,et al.  Social Bookmarking for Scholarly Digital Libraries , 2007, IEEE Internet Computing.

[5]  Eyke Hüllermeier,et al.  A Unified Model for Multilabel Classification and Ranking , 2006, ECAI.

[6]  Grigory Begelman,et al.  Automated Tag Clustering: Improving search and exploration in the tag space , 2006 .

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Yang Song,et al.  Real-time automatic tag recommendation , 2008, SIGIR '08.

[9]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[10]  Hongyuan Zha,et al.  Computational Statistics Data Analysis , 2021 .

[11]  Michael I. Jordan,et al.  Sparse Gaussian Process Classification With Multiple Classes , 2004 .

[12]  Antal van den Bosch,et al.  Recommending scientific articles using citeulike , 2008, RecSys '08.

[13]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[14]  Klaus Obermayer,et al.  Soft nearest prototype classification , 2003, IEEE Trans. Neural Networks.

[15]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[17]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Panagiotis Symeonidis,et al.  Tag recommendations based on tensor dimensionality reduction , 2008, RecSys '08.

[20]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[21]  Peter Schlattmann,et al.  Estimating the number of components in a finite mixture model: the special case of homogeneity , 2003, Comput. Stat. Data Anal..

[22]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[23]  Andreas Hotho,et al.  Tag Recommendations in Folksonomies , 2007, LWA.

[24]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[25]  Yang Song,et al.  A sparse gaussian processes classification framework for fast tag suggestions , 2008, CIKM '08.

[26]  Siegfried Handschuh,et al.  P-TAG: large scale automatic generation of personalized annotation tags for the web , 2007, WWW '07.

[27]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[28]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[29]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[30]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..