Stable Graphical Models

Stable random variables are motivated by the central limit theorem for densities with (potentially) unbounded variance and can be thought of as natural generalizations of the Gaussian distribution to skewed and heavy-tailed phenomenon. In this paper, we introduce stable graphical (SG) models, a class of multivariate stable densities that can also be represented as Bayesian networks whose edges encode linear dependencies between random variables. One major hurdle to the extensive use of stable distributions is the lack of a closed-form analytical expression for their densities. This makes penalized maximum-likelihood based learning computationally demanding. We establish theoretically that the Bayesian information criterion (BIC) can asymptotically be reduced to the computationally more tractable minimum dispersion criterion (MDC) and develop StabLe, a structure learning algorithm based on MDC. We use simulated datasets for five benchmark network topologies to empirically demonstrate how StabLe improves upon ordinary least squares (OLS) regression. We also apply StabLe to microarray gene expression data for lymphoblastoid cells from 727 individuals belonging to eight global population groups. We establish that StabLe improves test set performance relative to OLS via ten-fold cross-validation. Finally, we develop SGEX, a method for quantifying differential expression of genes between different population groups.

[1]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  Enrique F. Castillo,et al.  Stochastic Demand Dynamic Traffic Models Using Generalized Beta-Gaussian Bayesian Networks , 2012, IEEE Transactions on Intelligent Transportation Systems.

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[6]  Nir Friedman,et al.  Tissue classification with gene expression profiles , 2000, RECOMB '00.

[7]  F. e. Calcul des Probabilités , 1889, Nature.

[8]  Carlos Guestrin,et al.  Inference with Multivariate Heavy-Tails in Linear Models , 2010, NIPS.

[9]  Jay M. Berger,et al.  A New Model for Error Clustering in Telephone Circuits , 1963, IBM J. Res. Dev..

[10]  C D Hardin Skewed Stable Variables and Processes. , 1984 .

[11]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[12]  P. Levy,et al.  Calcul des Probabilites , 1926, The Mathematical Gazette.

[13]  Diego P. Ruiz,et al.  A heavy-tailed empirical Bayes method for replicated microarray data , 2009, Comput. Stat. Data Anal..

[14]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[15]  Josiane Zerubia,et al.  Modeling SAR images with a generalization of the Rayleigh distribution , 2004, IEEE Transactions on Image Processing.

[16]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[17]  V. Zolotarev Mellin-Stieltjes Transforms in Probability Theory , 1957 .

[18]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[19]  Matteo Bonato Modeling fat tails in stock returns: a multivariate stable-GARCH approach , 2012, Comput. Stat..

[20]  John P. Nolan,et al.  Calculation of multidimensional stable densities , 1995 .

[21]  D. Applebaum Stable non-Gaussian random processes , 1995, The Mathematical Gazette.

[22]  B. Stuck Minimum error dispersion linear filtering of scalar symmetric stable processes , 1978 .

[23]  J. L. Nolan Stable Distributions. Models for Heavy Tailed Data , 2001 .

[24]  Rui Li,et al.  Large-scale directional connections among multi resting-state neural networks in human brain: A functional MRI and Bayesian network modeling study , 2011, NeuroImage.

[25]  S. Diego,et al.  Modelling and Assessing Differential Gene Expression Using the Alpha Stable Distribution , 2009 .

[26]  J. Nolan,et al.  Linear and nonlinear regression with stable errors , 2013 .

[27]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[28]  Charles M. Grinstead,et al.  Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[29]  Ercan E. Kuruoglu,et al.  Density parameter estimation of skewed α-stable distributions , 2001, IEEE Trans. Signal Process..

[30]  C. L. Nikias,et al.  Signal processing with alpha-stable distributions and applications , 1995 .

[31]  L. Dworsky An Introduction to Probability , 2008 .

[32]  David Heckerman,et al.  Dependency Networks for Density Estimation, Collaborative Filtering, and Data Visualization , 2000 .

[33]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[34]  C. Geiss,et al.  An introduction to probability theory , 2008 .

[35]  B. Mandlebrot The Variation of Certain Speculative Prices , 1963 .

[36]  Alin Achim,et al.  Image denoising using bivariate α-stable distributions in the complex wavelet domain , 2005, IEEE Signal Processing Letters.

[37]  C. Mallows,et al.  A Method for Simulating Stable Random Variables , 1976 .

[38]  Mark W. Schmidt,et al.  Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[39]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[40]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[41]  José R. Gallardo,et al.  Use of alpha-stable self-similar stochastic processes for modeling traffic in broadband networks , 2000, Perform. Evaluation.

[42]  Alfred Stein,et al.  Application of the Expectation Maximization Algorithm to Estimate Missing Values in Gaussian Bayesian Network Modeling for Forest Growth , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[43]  Alin Achim,et al.  Novel Bayesian multiscale method for speckle removal in medical ultrasound images , 2001, IEEE Transactions on Medical Imaging.

[44]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[45]  Daphne Koller,et al.  Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks , 2005, UAI.