A hybrid PLSA approach for warmer cold start in folksonomy recommendation

We investigate the problem of item recommendation during the first months of the collaborative tagging community Ci­ teULike. CiteULike is a so-called folksonomy where users have the possibility to organize publications through anno­ tations tags. Making reliable recommendations during the initial phase of a folksonomy is a difficult task, since infor­ mation about user preferences is meager. In order to im­ prove recommendation results during this cold start period, we present a probabilistic approach to item recommenda­ tion. Our model extends previously proposed models such as probabilistic latent semantic analysis (PLSA) by merging both user-item as well as item-tag observations into a unified representation. We find that bringing tags into play reduces the risk of overfitting and increases overall recommendation quality. Experiments show that our approach outperforms other types of recommenders.

[1]  Panagiotis Symeonidis,et al.  Tag recommendations based on tensor dimensionality reduction , 2008, RecSys '08.

[2]  J. Larsen,et al.  Unveiling Music Structure via PLSA Similarity Fusion , 2007, 2007 IEEE Workshop on Machine Learning for Signal Processing.

[3]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[4]  Leonhard Hennig,et al.  Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis , 2009, RANLP.

[5]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[6]  Andreas Hotho,et al.  FolkRank : A Ranking Algorithm for Folksonomies , 2006, LWA.

[7]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[8]  Wolfgang Nejdl,et al.  Can all tags be used for search? , 2008, CIKM '08.

[9]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[10]  Clemens H. Cap,et al.  Similarity cross-analysis of tag / co-tag spaces in social classification systems , 2008, SSM '08.

[11]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[12]  Thorsten Brants,et al.  Topic-based document segmentation with probabilistic latent semantic analysis , 2002, CIKM '02.

[13]  Robert Wetzker,et al.  A hybrid approach to item recommendation in folksonomies , 2009, ESAIR '09.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Licia Capra,et al.  Social ranking: uncovering relevant content using tag-based recommender systems , 2008, RecSys '08.

[16]  Russell Beale,et al.  Sharing vocabularies: tag usage in CiteULike , 2008, BCS HCI.