Improving the Scalability of ALS-based Large Recommender Systems with Similar User Index

Alternating Least Squares (ALS) is popular method to compute matrix factorization in the parallel way. However, due to the time complexity in predicting user’s preference, ALS is not scalable to large-scale datasets. In this paper, we propose a similar user index-based parallel matrix factorization approach. Since the group of similar users is indexed in advance, there is no need to compute similarities between all users in datasets. Furthermore, the size of a matrix is reduced because the matrix is only composed of indexed user’s ratings and items. The current advanced cloud computing including Hadoop, MapReduce and Amazon EC2 are employed to implement the proposed approaches. We empirically show that the use of similar user index resolves the scalable issue of ALS and improves the performance of large scale recommender systems in distributed computing environment.

[1]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[2]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[3]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[4]  Yehuda Koren,et al.  Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  Martin Ester,et al.  A matrix factorization technique with trust propagation for recommendation in social networks , 2010, RecSys '10.

[6]  Domonkos Tikk,et al.  Investigation of Various Matrix Factorization Methods for Large Recommender Systems , 2008, ICDM Workshops.

[7]  Andreas Mavridis,et al.  Matrix factorization techniques for recommender systems , 2017 .

[8]  Inderjit S. Dhillon,et al.  Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[9]  P. Danielsson Euclidean distance mapping , 1980 .

[10]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[11]  Michael R. Lyu,et al.  Effective missing data prediction for collaborative filtering , 2007, SIGIR.

[12]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.