Efficient Optimization Methods for Extreme Similarity Learning with Nonlinear Embeddings

We study the problem of learning similarity by using nonlinear embedding models (e.g., neural networks) from all possible pairs. This problem is well-known for its difficulty of training with the extreme number of pairs. For the special case of using linear embeddings, many studies have addressed this issue of handling all pairs by considering certain loss functions and developing efficient optimization algorithms. This paper aims to extend results for general nonlinear embeddings. First, we finish detailed derivations and provide clean formulations for efficiently calculating some building blocks of optimization algorithms such as function, gradient evaluation, and Hessian-vector product. The result enables the use of many optimization methods for extreme similarity learning with nonlinear embeddings. Second, we study some optimization methods in detail. Due to the use of nonlinear embeddings, implementation issues different from linear cases are addressed. In the end, some methods are shown to be highly efficient for extreme similarity learning with nonlinear embeddings.

[1]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[2]  Walid Krichene,et al.  Neural Collaborative Filtering vs. Matrix Factorization Revisited , 2020, RecSys.

[3]  Xiaodong He,et al.  A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems , 2015, WWW.

[4]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[5]  Rui Zhang,et al.  Doubly Robust Joint Learning for Recommendation on Data Missing Not at Random , 2019, ICML.

[6]  Tat-Seng Chua,et al.  Fast Matrix Factorization for Online Recommendation with Implicit Feedback , 2016, SIGIR.

[7]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[8]  Chih-Jen Lin,et al.  Improving Ad Click Prediction by Considering Non-displayed Events , 2019, CIKM.

[9]  John R. Anderson,et al.  Efficient Training on Very Large Corpora via Gramian Estimation , 2018, ICLR.

[10]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[11]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[12]  Ray Kurzweil,et al.  Learning Semantic Textual Similarity from Conversations , 2018, Rep4NLP@ACL.

[13]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[14]  Linjun Yang,et al.  Embedding-based Retrieval in Facebook Search , 2020, KDD.

[15]  Li Wei,et al.  Sampling-bias-corrected neural modeling for large corpus item recommendations , 2019, RecSys.

[16]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[17]  Chih-Jen Lin,et al.  A Unified Algorithm for One-Cass Structured Matrix Factorization with Side Information , 2017, AAAI.

[18]  Rong Pan,et al.  Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering , 2009, KDD.

[19]  Chih-Jen Lin,et al.  An Efficient Alternating Newton Method for Learning Factorization Machines , 2018, ACM Trans. Intell. Syst. Technol..

[20]  Xiangnan He,et al.  A Generic Coordinate Descent Framework for Learning from Implicit Feedback , 2016, WWW.

[21]  Chih-Jen Lin,et al.  Subsampled Hessian Newton Methods for Supervised Learning , 2015, Neural Computation.

[22]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[23]  BOWEN YUAN,et al.  One-class Field-aware Factorization Machines for Recommender Systems with Implicit Feedbacks , 2019 .

[24]  Chih-Jen Lin,et al.  Selection of Negative Samples for One-class Matrix Factorization , 2017, SDM.

[25]  Inderjit S. Dhillon,et al.  Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[26]  Qiang Yang,et al.  One-Class Collaborative Filtering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[27]  Chih-Jen Lin,et al.  Newton Methods for Convolutional Neural Networks , 2018, ACM Trans. Intell. Syst. Technol..

[28]  Inderjit S. Dhillon,et al.  Provable Non-linear Inductive Matrix Completion , 2019, NeurIPS.

[29]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[30]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.