Distributed Collaborative Hashing and Its Applications in Ant Financial

Collaborative filtering, especially latent factor model, has been popularly used in personalized recommendation. Latent factor model aims to learn user and item latent factors from user-item historic behaviors. To apply it into real big data scenarios, efficiency becomes the first concern, including offline model training efficiency and online recommendation efficiency. In this paper, we propose a D istributed C ollaborative H ashing ( DCH ) model which can significantly improve both efficiencies. Specifically, we first propose a distributed learning framework, following the state-of-the-art parameter server paradigm, to learn the offline collaborative model. Our model can be learnt efficiently by distributedly computing subgradients in minibatches on workers and updating model parameters on servers asynchronously. We then adopt hashing technique to speedup the online recommendation procedure. Recommendation can be quickly made through exploiting lookup hash tables. We conduct thorough experiments on two real large-scale datasets. The experimental results demonstrate that, comparing with the classic and state-of-the-art (distributed) latent factor models, DCH has comparable performance in terms of recommendation accuracy but has both fast convergence speed in offline model training procedure and realtime efficiency in online recommendation procedure. Furthermore, the encouraging performance of DCH is also shown for several real-world applications in Ant Financial.

[1]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[2]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[3]  Alexander J. Smola,et al.  DiFacto: Distributed Factorization Machines , 2016, WSDM.

[4]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[5]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[6]  Jure Leskovec,et al.  Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time , 2017, WWW.

[7]  Luo Si,et al.  Learning compact hashing codes for efficient tag completion and prediction , 2013, CIKM.

[8]  Xianglong Liu,et al.  Collaborative Hashing , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Luo Si,et al.  Preference preserving hashing for efficient recommendation , 2014, SIGIR.

[10]  Venu Satuluri,et al.  Factorbird - a Parameter Server Approach to Distributed Matrix Factorization , 2014, ArXiv.

[11]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[12]  Huanbo Luan,et al.  Discrete Collaborative Filtering , 2016, SIGIR.

[13]  David J. Fleet,et al.  Fast search in Hamming space with multi-index hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[15]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Xu Chen,et al.  KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial , 2017, KDD.

[17]  Alexander J. Smola,et al.  Collaborative Filtering on a Budget , 2010, AISTATS.

[18]  Hongyuan Zha,et al.  Learning binary codes for collaborative filtering , 2012, KDD.

[19]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[20]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Zina M. Ibrahim,et al.  Advances in Artificial Intelligence , 2003, Lecture Notes in Computer Science.

[22]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[23]  Alexander V. Smirnov,et al.  Locality-Sensitive Hashing for Distributed Privacy-Preserving Collaborative Filtering: An Approach and System Architecture , 2015, ICEIS.

[24]  James Bennett,et al.  The Netflix Prize , 2007 .

[25]  Deepak Agarwal,et al.  Regression-based latent factor models , 2009, KDD.

[26]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[27]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[28]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[29]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[30]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[31]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[32]  Xiaolin Zheng,et al.  Recommender System with Composite Social Trust Networks , 2016, Int. J. Web Serv. Res..

[33]  Xing Xie,et al.  Discrete Content-aware Matrix Factorization , 2017, KDD.

[34]  Zhen Lin,et al.  Context-Aware Collaborative Topic Regression with Social Matrix Factorization for Recommender Systems , 2014, AAAI.

[35]  Philip Koopman,et al.  Efficient High Hamming Distance CRCs for Embedded Networks , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[36]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[37]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[38]  Wei Liu,et al.  Learning to Hash for Indexing Big Data—A Survey , 2015, Proceedings of the IEEE.

[39]  Jun Wang,et al.  Comparing apples to oranges: a scalable solution with heterogeneous hashing , 2013, KDD.

[40]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[41]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.