Optimal link prediction with matrix logistic regression

We consider the problem of link prediction, based on partial observation of a large network, and on side information associated to its vertices. The generative model is formulated as a matrix logistic regression. The performance of the model is analysed in a high-dimensional regime under a structural assumption. The minimax rate for the Frobenius-norm risk is established and a combinatorial estimator based on the penalised maximum likelihood approach is shown to achieve it. Furthermore, it is shown that this rate cannot be attained by any (randomised) algorithm computable in polynomial time under a computational complexity assumption.

[1]  R. Dudley The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , 1967 .

[2]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[3]  G. C. Jain,et al.  On an exponential family , 1979 .

[4]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[5]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[6]  Colin McDiarmid,et al.  Topics in Chromatic Graph Theory: Colouring random graphs , 2015 .

[7]  Ludek Kucera,et al.  Expected Complexity of Graph Partitioning Problems , 1995, Discret. Appl. Math..

[8]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[9]  J. Dall,et al.  Random geometric graphs. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[11]  O. Bousquet A Bennett concentration inequality and its application to suprema of empirical processes , 2002 .

[12]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[14]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[15]  S. Mendelson,et al.  Uniform Uncertainty Principle for Bernoulli and Subgaussian Ensembles , 2006, math/0608665.

[16]  Biau Gérard,et al.  Statistical inference on graphs , 2006 .

[17]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[18]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[19]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[20]  P. Massart,et al.  Minimal Penalties for Gaussian Model Selection , 2007 .

[21]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[22]  F. Bunea Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization , 2008, 0808.4051.

[23]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[24]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[25]  R. Adamczak,et al.  Restricted Isometry Property of Matrices with Independent Columns and Neighborly Polytopes by Random Sampling , 2009, 0904.4723.

[26]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[27]  Francis R. Bach,et al.  Self-concordant analysis for logistic regression , 2009, ArXiv.

[28]  Philippe Rigollet,et al.  Kullback-Leibler aggregation and misspecified generalized linear models , 2009, 0911.2919.

[29]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[30]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[31]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[32]  C. Giraud Low rank Multivariate regression , 2010, 1009.5165.

[33]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[34]  A. Tsybakov,et al.  Estimation of high-dimensional low-rank matrices , 2009, 0912.5338.

[35]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[36]  Emmanuel J. Candès,et al.  Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[37]  G. Lugosi,et al.  High-dimensional random geometric graphs and their clique number , 2011 .

[38]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Nicolas Vayatis,et al.  Estimation of Simultaneously Sparse and Low Rank Matrices , 2012, ICML.

[40]  Alekh Agarwal,et al.  Computational Trade-offs in Statistical Learning , 2012 .

[41]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[42]  M. Wegkamp,et al.  Joint variable and rank selection for parsimonious estimation of high-dimensional matrices , 2011, 1110.3556.

[43]  Michael I. Jordan,et al.  Computational and statistical tradeoffs via convex relaxation , 2012, Proceedings of the National Academy of Sciences.

[44]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[45]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[46]  P. Wolfe,et al.  Nonparametric graphon estimation , 2013, 1309.5936.

[47]  Yihong Wu,et al.  Computational Barriers in Minimax Submatrix Detection , 2013, ArXiv.

[48]  Sara van de Geer,et al.  Confidence sets in sparse regression , 2012, 1209.1508.

[49]  Martin J. Wainwright,et al.  Lower bounds on the performance of polynomial-time algorithms for sparse linear regression , 2014, COLT.

[50]  C. Giraud Introduction to High-Dimensional Statistics , 2014 .

[51]  Harrison H. Zhou,et al.  Sparse CCA: Adaptive Estimation and Computational Barriers , 2014, 1409.8565.

[52]  Quentin Berthet,et al.  Statistical and computational trade-offs in estimation of sparse principal components , 2014, 1408.5369.

[53]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[54]  Weijie J. Su,et al.  SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[55]  Volkan Cevher,et al.  Designing Statistical Estimators That Balance Sample Size, Risk, and Computational Cost , 2015, IEEE Journal of Selected Topics in Signal Processing.

[56]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[57]  A. Tsybakov,et al.  Oracle inequalities for network models and sparse graphon estimation , 2015, 1507.04118.

[58]  Emmanuel Abbe,et al.  Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap , 2015, ArXiv.

[59]  E. Levina,et al.  Estimating network edge probabilities by neighborhood smoothing , 2015, 1509.08588.

[60]  Yudong Chen,et al.  Incoherence-Optimal Matrix Completion , 2013, IEEE Transactions on Information Theory.

[61]  Harrison H. Zhou,et al.  Rate-optimal graphon estimation , 2014, 1410.5837.

[62]  Yonina C. Eldar,et al.  Simultaneously Structured Models With Application to Sparse and Low-Rank Matrices , 2012, IEEE Transactions on Information Theory.

[63]  Bruce E. Hajek,et al.  Computational Lower Bounds for Community Detection on Random Graphs , 2014, COLT.

[64]  Sébastien Bubeck,et al.  Testing for high‐dimensional geometry in random graphs , 2014, Random Struct. Algorithms.

[65]  Jianqing Fan,et al.  Robust Low-Rank Matrix Recovery , 2016 .

[66]  Jess Banks,et al.  Information-theoretic thresholds for community detection in sparse networks , 2016, COLT.

[67]  Stéphan Clémençon,et al.  On Graph Reconstruction via Empirical Risk Minimization: Fast Learning Rates and Scalability , 2016, NIPS.

[68]  Lalit Jain,et al.  Finite Sample Prediction and Recovery Bounds for Ordinal Embedding , 2016, NIPS.

[69]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[70]  Venkat Chandrasekaran,et al.  Resource Allocation for Statistical Estimation , 2014, Proceedings of the IEEE.

[71]  Yaniv Plan,et al.  Average-case hardness of RIP certification , 2016, NIPS.

[72]  Felix Abramovich,et al.  Model Selection and Minimax Estimation in Generalized Linear Models , 2014, IEEE Transactions on Information Theory.

[73]  Vianney Perchet,et al.  Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe , 2017, NIPS.

[74]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[75]  Anru R. Zhang,et al.  Tensor SVD: Statistical and Computational Limits , 2017, IEEE Transactions on Information Theory.

[76]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[77]  Yihong Wu,et al.  Statistical and Computational Limits for Sparse Matrix Detection , 2018, The Annals of Statistics.

[78]  Rémi Gribonval,et al.  Stable recovery of low-dimensional cones in Hilbert spaces: One RIP to rule them all , 2015, Applied and Computational Harmonic Analysis.

[79]  A. Tsybakov,et al.  Slope meets Lasso: Improved oracle bounds and optimality , 2016, The Annals of Statistics.

[80]  Piyush Srivastava,et al.  Exact recovery in the Ising blockmodel , 2016, The Annals of Statistics.

[81]  Jordan S. Ellenberg,et al.  Detection of Planted Solutions for Flat Satisfiability Problems , 2019, AISTATS.

[82]  Jianqing Fan,et al.  Generalized high-dimensional trace regression via nuclear norm regularization , 2017, Journal of Econometrics.

[83]  Weichen Wang,et al.  A SHRINKAGE PRINCIPLE FOR HEAVY-TAILED DATA: HIGH-DIMENSIONAL ROBUST LOW-RANK MATRIX RECOVERY. , 2016, Annals of statistics.