Efficient inference in matrix-variate Gaussian models with \iid observation noise

Inference in matrix-variate Gaussian models has major applications for multi-output prediction and joint learning of row and column covariances from matrix-variate data. Here, we discuss an approach for efficient inference in such models that explicitly account for iid observation noise. Computational tractability can be retained by exploiting the Kronecker product between row and column covariance matrices. Using this framework, we show how to generalize the Graphical Lasso in order to learn a sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples. We show practical utility on applications to biology, where we model covariances with more than 100,000 dimensions. We find greater accuracy in recovering biological network structures and are able to better reconstruct the confounders.

[1]  H. Wackernagle,et al.  Multivariate geostatistics: an introduction with applications , 1998 .

[2]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[3]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[4]  Neil D. Lawrence,et al.  Computationally Efficient Convolved Multiple Output Gaussian Processes , 2011, J. Mach. Learn. Res..

[5]  Jeff G. Schneider,et al.  Learning Multiple Tasks with a Sparse Matrix-Normal Penalty , 2010, NIPS.

[6]  P. Dutilleul The mle algorithm for the matrix normal distribution , 1999 .

[7]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[8]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[9]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[10]  L. Kruglyak,et al.  Gene–Environment Interaction in Yeast Gene Expression , 2008, PLoS biology.

[11]  Robert Tibshirani,et al.  Inference with transposable data: modelling the effects of row and column correlations , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[12]  Yiannis Kourmpetis,et al.  Gene Regulatory Networks from Multifactorial Perturbations Using Graphical Lasso: Application to the DREAM4 Challenge , 2010, PloS one.

[13]  Leopold Parts,et al.  A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies , 2010, PLoS Comput. Biol..

[14]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[15]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[16]  S. Leal Genetics and Analysis of Quantitative Traits , 2001 .

[17]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[18]  Bernhard Schölkopf,et al.  Invariant Gaussian Process Latent Variable Models and Application in Causal Discovery , 2010, UAI.

[19]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..