Learning high-dimensional mixed graphical models with missing values

Current biomedical instrumentation enables monitoring an increasing amount of mixed discrete and continuous variables. This results in multivariate samples amenable for their analysis using mixed graphical models. The dimension of these data sets, with a number of variablespmuch larger than the number of observationsn, precludes the direct application of classical learning algorithms and specific procedures that work under thepnsetting are required for that purpose. Yet, a further obstacle to the wide application of these pro- cedures in the biomedical field arises from the fact that missing observations often occur in clinical and genotype data. The high-dimension ofpimpedes approaching the problem by simple complete-case analysis and increases substantially the computational burden if we want to use multiply-imputed data sets. Here we show that using limited-order correla- tions to learn mixed graphical models from data withpnenables a straightforward and effective application of complete-case analysis to the missing data problem. More impor- tantly, because complete-case analysis is only appropriate under the restrictive assumption that data are missing completely at random, we adapt an expectation-maximization al- gorithm to the limited-order correlation framework and demonstrate its better suitability under the less stringent assumption of data being missing at random.

[1]  S Greenland,et al.  A critical look at methods for handling missing covariates in epidemiologic regression analyses. , 1995, American journal of epidemiology.

[2]  Frank Harary,et al.  Graph Theory , 2016 .

[3]  Zhi Geng,et al.  Mixed Graphical Models with Missing Data and the Partial Imputation EM Algorithm , 2000 .

[4]  D. Edwards Introduction to graphical modelling , 1995 .

[5]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[6]  Robert Castelo,et al.  A Robust Procedure For Gaussian Graphical Model Search From Microarray Data With p Larger Than n , 2006, J. Mach. Learn. Res..

[7]  V. Didelez,et al.  Maximum likelihood estimation in graphical models with missing values , 1998 .

[8]  David Edwards,et al.  Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests , 2010, BMC Bioinformatics.

[9]  Hao Wu,et al.  R/qtl: QTL Mapping in Experimental Crosses , 2003, Bioinform..

[10]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[11]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[12]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Robert Castelo,et al.  Learning mixed graphical models from data with p larger than n , 2011, UAI.