论文信息 - Core Dependency Networks

Core Dependency Networks

Many applications infer the structure of a probabilistic graphical model from data to elucidate the relationships between variables. But how can we train graphical models on a massive data set? In this paper, we show how to construct coresets—compressed data sets which can be used as proxy for the original data and have provably bounded worst case error—for Gaussian dependency networks (DNs), i.e., cyclic directed graphical models over Gaussians, where the parents of each variable are its Markov blanket. Specifically, we prove that Gaussian DNs admit coresets of size independent of the size of the data set. Unfortunately, this does not extend to DNs over members of the exponential family in general. As we will prove, Poisson DNs do not admit small coresets. Despite this worst-case result, we will provide an argument why our coreset construction for DNs can still work well in practice on count data. To corroborate our theoretical results, we empirically evaluated the resulting Core DNs on real data sets. The results demonstrate significant gains over no or naive sub-sampling, even in the case of count data. Artificial intelligence and machine learning have achieved considerable successes in recent years, and an ever-growing number of disciplines rely on them. Data is now ubiquitous, and there is great value in understanding the data, e.g., building probabilistic graphical models to elucidate the relationships between variables. In the big data era, however, scalability has become crucial for any useful machine learning approach. In this paper, we consider the problem of training graphical models, in particular, Dependency Networks (Heckerman et al. 2000), on massive data sets. They are cyclic directed graphical models, where the parents of each variable are its Markov blanket and have been proven successful in various tasks, such as collaborative filtering (Heckerman et al. 2000), phylogenetic analysis (Carlson et al. 2008), genetic analysis (Dobra 2009; Phatak et al. 2010), network inference from sequencing data (Allen and Liu 2013), and traffic as well as topic modeling (Hadiji et al. 2015). Specifically, we show that Dependency Networks over Gaussians—arguably one of the most prominent type of distribution in statistical machine learning—admit coresets of size independent of the size of the data set. Coresets are Copyright c © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. weighted subsets of the data, which guarantee that models fitting them will also provide a good fit for the original data set, and have been studied before for clustering (Badoiu, Har-Peled, and Indyk 2002; Feldman, Faulkner, and Krause 2011; Feldman, Schmidt, and Sohler 2013; Lucic, Bachem, and Krause 2016), classification (Har-Peled, Roth, and Zimak 2007; Har-Peled 2015; Reddi, Póczos, and Smola 2015), regression (Drineas, Mahoney, and Muthukrishnan 2006; 2008; Dasgupta et al. 2009; Geppert et al. 2017), and the smallest enclosing ball problem (Badoiu and Clarkson 2003; 2008; Feldman, Munteanu, and Sohler 2014; Agarwal and Sharathkumar 2015); we refer to (Phillips 2017) for a recent extensive literature overview. Our contribution continues this line of research and generalizes the use of coresets to probabilistic graphical modeling. Unfortunately, this coreset result does not extend to Dependency Networks over members of the exponential family in general. We prove that Dependency Networks over Poisson random variables (Allen and Liu 2013; Hadiji et al. 2015) do not admit (sublinear size) coresets: every single input point is important for the model and needs to appear in the coreset.This is unfortunate when modeling count data— the primary target of Poisson distributions—which is at the center of many scientific endeavors such as citation counts, number of web page hits, counts of procedures in medicine, etc. Therefore, despite our worst-case result, we will provide an argument why our coreset construction for Dependency Networks can still work well in practice on count data. To corroborate our theoretical results, we empirically evaluated the resulting Core Dependency Networks (CDNs) on several real data sets and demonstrate significant gains over no or naive sub-sampling, even for count data. We proceed as follows. We review Dependency Networks (DNs), prove that Gaussian DNs admit sublinear size coresets, and discuss the possibility to generalize this result to count data. Before concluding, we present empirical results. Dependency Networks Most of the existing AI and machine learning literature on graphical models is dedicated to binary, multinomial, or certain classes of continuous (e.g. Gaussian) random variables. Undirected models, aka Markov Random Fields (MRFs), such as Ising (binary random variables) and Potts (multinomial random variables) models have found a lot of applicaThe Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)

[1] Piotr Indyk,et al. Approximate clustering via core-sets , 2002, STOC '02.

[2] S. Muthukrishnan,et al. Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[3] Pradeep Ravikumar,et al. Graphical models via univariate exponential family distributions , 2013, J. Mach. Learn. Res..

[4] Pedro M. Domingos,et al. Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[5] Christian Wietfeld,et al. LTE Connectivity and Vehicular Traffic Prediction Based on Machine Learning Approaches , 2015, 2015 IEEE 82nd Vehicular Technology Conference (VTC2015-Fall).

[6] William J. Wilson,et al. NetRaVE: constructing dependency networks using sparse linear regression , 2010, Bioinform..

[7] Michael W. Mahoney. Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[8] Dan Feldman,et al. Smallest enclosing ball for probabilistic data , 2014, SoCG.

[9] Andreas Krause,et al. Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures , 2015, AISTATS.

[10] Andreas Krause,et al. Scalable Training of Mixture Models via Coresets , 2011, NIPS.

[11] Mark Rudelson,et al. Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[12] Genevera I. Allen,et al. A Local Poisson Graphical Model for Inferring Networks From Sequencing Data , 2013, IEEE Transactions on NanoBioscience.

[13] Joel A. Tropp,et al. Improved Analysis of the subsampled Randomized Hadamard Transform , 2010, Adv. Data Sci. Adapt. Anal..

[14] David B. Dunson,et al. Lognormal and Gamma Mixed Negative Binomial Regression , 2012, ICML.

[15] Pankaj K. Agarwal,et al. Streaming Algorithms for Extent Problems in High Dimensions , 2010, SODA '10.

[16] Christian Sohler,et al. Random projections for Bayesian regression , 2015, Statistics and Computing.