Networked Exponential Families for Big Data Over Networks

The data generated in many application domains can be modeled as big data over networks, i.e., massive collections of high-dimensional local datasets related via an intrinsic network structure. Machine learning for big data over networks must jointly leverage the information contained in the local datasets and their network structure. We propose networked exponential families as a novel probabilistic modeling framework for machine learning from big data over networks. We interpret the high-dimensional local datasets as the realizations of a random process distributed according to some exponential family. Networked exponential families allow us to jointly leverage the information contained in local datasets and their network structure in order to learn a tailored model for each local dataset. We formulate the task of learning the parameters of networked exponential families as a convex optimization problem. This optimization problem is an instance of the network Lasso and enforces a data-driven pooling (or clustering) of the local datasets according to their corresponding parameters for the exponential family. We derive an upper bound on the estimation error of network Lasso. This upper bound depends on the network structure and the information geometry of the node-wise exponential families. These insights provided by this bound can be used for determining how much data needs to be collected or observed to ensure network Lasso to be accurate. We also provide a scalable implementation of the network Lasso as a message-passing between adjacent local datasets. Such message passing is appealing for federated machine learning relying on edge computing. We finally note that the proposed method is also privacy-preserving because no raw data but only parameter (estimates) are shared among different nodes.

[1]  José M. F. Moura,et al.  Big Data over Networks , 2016 .

[2]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[3]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[4]  Alexander Jung,et al.  Localized Linear Regression in Networked Data , 2019, IEEE Signal Processing Letters.

[5]  Aryan Mokhtari,et al.  Global convergence of online limited memory BFGS , 2014, J. Mach. Learn. Res..

[6]  Alexander Jung,et al.  Analysis of Network Lasso for Semi-Supervised Regression , 2019, AISTATS.

[7]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[8]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[9]  Alexander Jung,et al.  When Is Network Lasso Accurate? , 2017, Front. Appl. Math. Stat..

[10]  Mert R. Sabuncu,et al.  A Generative Model for Image Segmentation Based on Label Fusion , 2010, IEEE Transactions on Medical Imaging.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Stephen P. Boyd,et al.  A Distributed Method for Fitting Laplacian Regularized Stratified Models , 2019, J. Mach. Learn. Res..

[13]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[14]  Alexander Jung,et al.  On the Complexity of Sparse Label Propagation , 2018, Front. Appl. Math. Stat..

[15]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[16]  Alexander Jung,et al.  Classifying Partially Labeled Networked Data VIA Logistic Network Lasso , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Wotao Yin,et al.  Parametric Maximum Flow Algorithms for Fast Total Variation Minimization , 2009, SIAM J. Sci. Comput..

[18]  E. Nadaraya On Non-Parametric Estimates of Density Functions and Regression Curves , 1965 .

[19]  Alexander Jung,et al.  Learning the Conditional Independence Structure of Stationary Time Series: A Multitask Learning Approach , 2014, IEEE Transactions on Signal Processing.

[20]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Stephen P. Boyd,et al.  Network Lasso: Clustering and Optimization in Large Graphs , 2015, KDD.

[22]  Sanjeev Arora,et al.  Provable Algorithms for Inference in Topic Models , 2016, ICML.

[23]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[24]  Jan-Christian Hü,et al.  Optimal rates for total variation denoising , 2016, COLT.

[25]  Antonin Chambolle,et al.  Diagonal preconditioning for first order primal-dual algorithms in convex optimization , 2011, 2011 International Conference on Computer Vision.

[26]  Zhi Ding,et al.  Federated Learning via Over-the-Air Computation , 2018, IEEE Transactions on Wireless Communications.

[27]  Nathan Srebro,et al.  Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data , 2009, NIPS.

[28]  P. Rigollet,et al.  Optimal rates for total variation denoising , 2016, 1603.09388.

[29]  Laurent Condat,et al.  A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms , 2012, Journal of Optimization Theory and Applications.

[30]  Cristopher Moore,et al.  Phase transitions in semisupervised clustering of sparse networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  W. Rudin Principles of mathematical analysis , 1964 .

[32]  Jelena Kovacevic,et al.  Representations of piecewise smooth signals on graphs , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  V. Viallon,et al.  Regression modeling on stratified data with the lasso , 2015, 1508.05476.

[34]  Alexander Jung,et al.  Classifying Big Data Over Networks Via The Logistic Network Lasso , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[35]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[36]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[37]  Pascal Frossard,et al.  Learning Graphs From Data: A Signal Representation Perspective , 2018, IEEE Signal Processing Magazine.

[38]  E. Levina,et al.  Prediction models for network-linked data , 2016, The Annals of Applied Statistics.

[39]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[40]  B. C. Muzyka Host factors affecting disease transmission. , 1996, Dental clinics of North America.

[41]  Nicola Dragoni,et al.  Foundations and Evolution of Modern Computing Paradigms: Cloud, IoT, Edge, and Fog , 2019, IEEE Access.

[42]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[43]  Eric P. Xing,et al.  Personalized regression enables sample-specific pan-cancer analysis , 2018, bioRxiv.

[44]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[45]  Jelena Kovacevic,et al.  Discrete Signal Processing on Graphs: Sampling Theory , 2015, IEEE Transactions on Signal Processing.

[46]  Michael I. Jordan,et al.  A General Analysis of the Convergence of ADMM , 2015, ICML.