Provable Non-linear Inductive Matrix Completion

Consider a standard recommendation/retrieval problem where given a query, the goal is to retrieve the most relevant items. Inductive matrix completion (IMC) method is a standard approach for this problem where the given query as well as the items are embedded in a common low-dimensional space. The inner product between a query embedding and an item embedding reflects relevance of the (query, item) pair. Non-linear IMC (NIMC) uses non-linear networks to embed the query as well as items, and is known to be highly effective for a variety of tasks, such as video recommendations for users, semantic web search, etc. Despite its wide usage, existing literature lacks rigorous understanding of NIMC models. A key challenge in analyzing such models is to deal with the non-convexity arising out of non-linear embeddings in addition to the non-convexity arising out of the low-dimensional restriction of the embedding space, which is akin to the low-rank restriction in the standard matrix completion problem. In this paper, we provide the first theoretical analysis for a simple NIMC model in the realizable setting, where the relevance score of a (query, item) pair is formulated as the inner product between their single-layer neural representations. Our results show that under mild assumptions we can recover the ground truth parameters of the NIMC model using standard (stochastic) gradient descent methods if the methods are initialized within a small distance to the optimal parameters. We show that a standard tensor method can be used to initialize the solution within the required distance to the optimal parameters. Furthermore, we show that the number of query-item relevance observations required, a key parameter in learning such models, scales nearly linearly with the input dimensionality thus matching existing results for the standard linear inductive matrix completion.

[1]  Shai Shalev-Shwartz,et al.  SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.

[2]  Varun Kanade,et al.  Reliably Learning the ReLU in Polynomial Time , 2016, COLT.

[3]  Nagarajan Natarajan,et al.  Inductive matrix completion for predicting gene–disease associations , 2014, Bioinform..

[4]  Inderjit S. Dhillon,et al.  Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[5]  Inderjit S. Dhillon,et al.  Matrix Completion with Noisy Side Information , 2015, NIPS.

[6]  Santosh S. Vempala,et al.  Polynomial Convergence of Gradient Descent for Training One-Hidden-Layer Neural Networks , 2018, ArXiv.

[7]  Yuandong Tian,et al.  When is a Convolutional Filter Easy To Learn? , 2017, ICLR.

[8]  Inderjit S. Dhillon,et al.  Efficient Matrix Sensing Using Rank-1 Gaussian Measurements , 2015, ALT.

[9]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[10]  Inderjit S. Dhillon,et al.  Provable Inductive Matrix Completion , 2013, ArXiv.

[11]  Sivaraman Balakrishnan,et al.  How Many Samples are Needed to Estimate a Convolutional Neural Network? , 2018, NeurIPS.

[12]  Anima Anandkumar,et al.  Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .

[13]  Dong Liu,et al.  Comparative Deep Learning of Hybrid Representations for Image Recommendations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[15]  Yuandong Tian,et al.  An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.

[16]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[17]  Yuanzhi Li,et al.  Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[18]  Inderjit S. Dhillon,et al.  Goal-Directed Inductive Matrix Completion , 2016, KDD.

[19]  Jieping Ye,et al.  A Non-convex One-Pass Framework for Generalized Factorization Machine and Rank-One Matrix Sensing , 2016, NIPS.

[20]  David P. Woodruff,et al.  Low rank approximation with entrywise l1-norm error , 2017, STOC.

[21]  Raghu Meka,et al.  Learning One Convolutional Layer with Overlapping Patches , 2018, ICML.

[22]  Xiao Zhang,et al.  Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.

[23]  Percy Liang,et al.  Tensor Factorization via Matrix Factorization , 2015, AISTATS.

[24]  Tengyu Ma,et al.  Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.

[25]  CARLOS A. GOMEZ-URIBE,et al.  The Netflix Recommender System , 2015, ACM Trans. Manag. Inf. Syst..

[26]  Inderjit S. Dhillon,et al.  Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.

[27]  Miao Xu,et al.  Speedup Matrix Completion with Side Information: Application to Multi-Label Learning , 2013, NIPS.

[28]  Inderjit S. Dhillon,et al.  Tumblr Blog Recommendation with Boosted Inductive Matrix Completion , 2015, CIKM.

[29]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[30]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[31]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[32]  Yuandong Tian,et al.  Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.

[33]  David P. Woodruff,et al.  Relative Error Tensor Low Rank Approximation , 2017, Electron. Colloquium Comput. Complex..

[34]  Amir Globerson,et al.  Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.

[35]  David P. Woodruff,et al.  Weighted low rank approximations with provable guarantees , 2016, STOC.

[36]  Francis R. Bach,et al.  Low-rank matrix factorization with attributes , 2006, ArXiv.

[37]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[38]  Inderjit S. Dhillon,et al.  Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels , 2017, ArXiv.

[39]  Ye Wang,et al.  Improving Content-based and Hybrid Music Recommendation using Deep Learning , 2014, ACM Multimedia.

[40]  Xiao Zhang,et al.  Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow , 2018, ICML.

[41]  Moritz Hardt,et al.  Understanding Alternating Minimization for Matrix Completion , 2013, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.