A Parallel and Efficient Algorithm for Learning to Match

Many tasks in data mining and related fields can be formalized as matching between objects in two heterogeneous domains, including collaborative filtering, link prediction, image tagging, and web search. Machine learning techniques, referred to as learning-to-match in this paper, have been successfully applied to the problems. Among them, a class of state-of-the-art methods, named feature-based matrix factorization, formalize the task as an extension to matrix factorization by incorporating auxiliary features into the model. Unfortunately, making those algorithms scale to real world problems is challenging, and simple parallelization strategies fail due to the complex cross talking patterns between sub-tasks. In this paper, we tackle this challenge with a novel parallel and efficient algorithm. Our algorithm, based on coordinate descent, can easily handle hundreds of millions of instances and features on a single machine. The key recipe of this algorithm is an iterative relaxation of the objective to facilitate parallel updates of parameters, with guaranteed convergence on minimizing the original objective function. Experimental results demonstrate that the proposed method is effective on a wide range of matching problems, with efficiency significantly improved upon the baselines while accuracy retained unchanged.

[1]  Tapani Raiko,et al.  European conference on machine learning and knowledge discovery in databases , 2014 .

[2]  Chih-Jen Lin,et al.  A fast parallel SGD for matrix factorization in shared memory systems , 2013, RecSys.

[3]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[4]  Ambuj Tewari,et al.  Feature Clustering for Accelerating Parallel Coordinate Descent , 2012, NIPS.

[5]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[6]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[7]  Chong Wang,et al.  Latent Collaborative Retrieval , 2012, ICML.

[8]  Ambuj Tewari,et al.  Scaling Up Coordinate Descent Algorithms for Large ℓ1 Regularization Problems , 2012, ICML.

[9]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[10]  Inderjit S. Dhillon,et al.  Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[11]  Yong Yu,et al.  Collaborative personalized tweet recommendation , 2012, SIGIR '12.

[12]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[13]  Thore Graepel,et al.  Matchbox: large scale online bayesian recommendations , 2009, WWW '09.

[14]  Steffen Rendle,et al.  Factorization Machines with libFM , 2012, TIST.

[15]  Wei Wu,et al.  Learning query and document similarities from click-through bipartite graph with metadata , 2013, WSDM.

[16]  Yoram Singer,et al.  Parallel Boosting with Momentum , 2013, ECML/PKDD.

[17]  Diyi Yang,et al.  Combining Factorization Model and Additive Forest for Collaborative Followee Recommendation , 2012 .

[18]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[19]  Chao Liu,et al.  Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce , 2010, WWW '10.

[20]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[21]  Wei Wu,et al.  Learning bilinear model for matching queries and documents , 2013, J. Mach. Learn. Res..

[22]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[23]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[24]  Steffen Rendle Scaling Factorization Machines to Relational Data , 2013, Proc. VLDB Endow..

[25]  Deepak Agarwal,et al.  Regression-based latent factor models , 2009, KDD.

[26]  Yong Yu,et al.  SVDFeature: a toolkit for feature-based collaborative filtering , 2012, J. Mach. Learn. Res..

[27]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[28]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[29]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[30]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .