Learning Networked Exponential Families with Network Lasso

We propose networked exponential families to jointly leverage the information in the topology as well as the attributes (features) of networked data points. Networked exponential families are a flexible probabilistic model for heterogeneous datasets with intrinsic network structure. These models can be learnt efficiently using network Lasso which implicitly pools or clusters the data points according to the intrinsic network structure and the local likelihood. The resulting method can be formulated as a non-smooth convex optimization problem which we solve using a primal-dual splitting method. This primal-dual method is appealing for big data applications as it can be implemented as a highly scalable message passing algorithm.

[1]  Alexander Jung,et al.  Localized Linear Regression in Networked Data , 2019, IEEE Signal Processing Letters.

[2]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[3]  Aryan Mokhtari,et al.  Global convergence of online limited memory BFGS , 2014, J. Mach. Learn. Res..

[4]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[5]  Michael I. Jordan,et al.  A General Analysis of the Convergence of ADMM , 2015, ICML.

[6]  Alexander Jung,et al.  When Is Network Lasso Accurate? , 2017, Front. Appl. Math. Stat..

[7]  W. Rudin Principles of mathematical analysis , 1964 .

[8]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[9]  Shuai Li,et al.  Collaborative Filtering Bandits , 2015, SIGIR.

[10]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[11]  Stephen P. Boyd,et al.  Network Lasso: Clustering and Optimization in Large Graphs , 2015, KDD.

[12]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[15]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[16]  Pascal Frossard,et al.  Learning Graphs From Data: A Signal Representation Perspective , 2018, IEEE Signal Processing Magazine.

[17]  P. Rigollet,et al.  Optimal rates for total variation denoising , 2016, 1603.09388.

[18]  Alexander Jung,et al.  A Fixed-Point of View on Gradient Methods for Big Data , 2017, Front. Appl. Math. Stat..

[19]  Laurent Condat,et al.  A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms , 2012, Journal of Optimization Theory and Applications.

[20]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[21]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[22]  Antonin Chambolle,et al.  Diagonal preconditioning for first order primal-dual algorithms in convex optimization , 2011, 2011 International Conference on Computer Vision.

[23]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[24]  Alexander Jung,et al.  The RKHS Approach to Minimum Variance Estimation Revisited: Variance Bounds, Sufficient Statistics, and Exponential Families , 2012, IEEE Transactions on Information Theory.

[25]  Shuai Li,et al.  Online Clustering of Bandits , 2014, ICML.

[26]  Alexander Jung,et al.  Learning the Conditional Independence Structure of Stationary Time Series: A Multitask Learning Approach , 2014, IEEE Transactions on Signal Processing.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[29]  Eric P. Xing,et al.  Personalized regression enables sample-specific pan-cancer analysis , 2018, bioRxiv.

[30]  Sanjeev Arora,et al.  Provable Algorithms for Inference in Topic Models , 2016, ICML.

[31]  Nathan Srebro,et al.  Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data , 2009, NIPS.

[32]  Alexander Jung,et al.  On the Complexity of Sparse Label Propagation , 2018, Front. Appl. Math. Stat..

[33]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[34]  Wotao Yin,et al.  Parametric Maximum Flow Algorithms for Fast Total Variation Minimization , 2009, SIAM J. Sci. Comput..

[35]  Jelena Kovacevic,et al.  Representations of piecewise smooth signals on graphs , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Alexander Jung,et al.  Analysis of Network Lasso for Semi-Supervised Regression , 2019, AISTATS.

[37]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[38]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[39]  E. Levina,et al.  Prediction models for network-linked data , 2016, The Annals of Applied Statistics.

[40]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  John Shawe-Taylor,et al.  Localized Lasso for High-Dimensional Regression , 2016, AISTATS.

[42]  Lawrence D. Brown Fundamentals of Statistical Exponential Families , 1987 .

[43]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[44]  Alexander Jung,et al.  Classifying Big Data Over Networks Via The Logistic Network Lasso , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[45]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[46]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..