SHRec: Scalable Holistic Recommendation

The problem of recommending items to users is of high practical importance. For instance, many web services try to find relevant recommendations for the users, e.g., finding relevant movies, social-media friends, restaurants, shopping items, etc. The expansion of the Web and the ever-growing number of people who use web services render the problem of recommendation challenging. The Locality Sensitive Hashing (LSH, for short) is the most known scalable technique for nearest-neighbor search in high dimensional data, and hence the LSH is widely used in most industrial recommendation systems. This paper presents an implementation of the LSH using Google's MapReduce engine. We apply the LSH to a real case study at Google, where we recommend for each web-host a set of outlinks based on the outlink similarity amongst the web-hosts. We identify some performance limitations of the LSH that occur due to specific properties in the data, and that become significant when the scale of the data is large. Furthermore, we present SHRec, a novel technique for scalable recommendation that addresses these performance limitations. Based on real deployment of both SHRec and LSH on Google's infrastructure, and using real data of the crawled Web at Google, where a sample host-level graph of 1.5 Billion web-hosts is extracted, we demonstrate that SHRec is more scalable than LSH. In particular, we show that SHRec is one order of magnitude faster than LSH while achieving better recommendation quality.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Loriene Roy,et al.  Content-based book recommending using learning for text categorization , 1999, DL '00.

[3]  Michael J. Pazzani,et al.  Learning and Revising User Profiles: The Identification of Interesting Web Sites , 1997, Machine Learning.

[4]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[5]  Laks V. S. Lakshmanan,et al.  On Efficient Recommendations for Online Exchange Markets , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6]  Mohamed F. Mokbel,et al.  A RecDB in Action: Recommendation Made Easy in Relational Databases , 2013, Proc. VLDB Endow..

[7]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[8]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[9]  Naoki Abe,et al.  Collaborative Filtering Using Weighted Majority Prediction Algorithms , 1998, ICML.

[10]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[11]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender Systems , 2000 .

[12]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[14]  Georgia Koutrika,et al.  FlexRecs: expressing and combining flexible recommendations , 2009, SIGMOD Conference.

[15]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[16]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[17]  Dit-Yan Yeung,et al.  SCMF: Sparse Covariance Matrix Factorization for Collaborative Filtering , 2013, IJCAI.

[18]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[19]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[20]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[21]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.

[22]  Rickard Cöster,et al.  Inverted file search algorithms for collaborative filtering , 2002, SIGIR '02.

[23]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[24]  Vanja Josifovski,et al.  Supercharging Recommender Systems using Taxonomies for Learning User Purchase Behavior , 2012, Proc. VLDB Endow..

[25]  Naohiro Ishii,et al.  Memory-Based Weighted-Majority Prediction for Recommender Systems , 1999, SIGIR 1999.

[26]  Cong Yu,et al.  From del.icio.us to x.qui.site: recommendations in social tagging sites , 2008, SIGMOD Conference.

[27]  Lise Getoor,et al.  Using Probabilistic Relational Models for Collaborative Filtering , 1999 .

[28]  Thomas Hofmann,et al.  Collaborative filtering via gaussian probabilistic latent semantic analysis , 2003, SIGIR.

[29]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .