论文信息 - High Dimensional Modelling

High Dimensional Modelling

This chapter describes methods suitable for high-dimensional graphical modeling. Recent years have seen intense interest in applying graphical modeling techniques to data of high dimension: by this we mean from hundreds to tens of thousands of variables. Such data arise routinely in fields such as molecular biology. We first describe two typical datasets: one from a study of gene expression in breast cancer patients, and the other from the HapMap project, in which a large number of genomic markers and gene expression measurements are recorded for 90 individuals. We compare the computational efficiency of some model selection algorithms, as applied to one of the example datasets. Of these, an extension of the Chow-Liu algorithm to find the minimal BIC forest, implemented in the gRapHD package, is found to be most efficient. Also the glasso algorithm and a stepwise decomposable search algorithm are highly efficient. We describe these algorithms in more detail and illustrate their use on the example datasets. Finally, as a more advanced example, we illustrate how a Bayesian equivalent to the minimal BIC forest algorithm for high-dimensional discrete data may be obtained. Assuming a hyper-Dirichlet prior, the maximum a posteriori forest is derived by using the extended Chow-Liu algorithm with appropriate user-defined edge weights. This is illustrated using a subset of the HapMap data.

Søren Højsgaard | Steffen L. Lauritzen | David Edwards

[1] P. Hall,et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2] J. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[3] Padhraic Smyth,et al. Conditional Chow-Liu Tree Structures for Modeling Discrete-Valued Vector Time Series , 2004, UAI.

[4] C. N. Liu,et al. Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[5] Doug Fisher,et al. Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[6] Edward M. Reingold,et al. Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[7] David Maxwell Chickering,et al. Learning Bayesian Networks is , 1994 .

[8] A. Dawid,et al. Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[9] R. Tibshirani,et al. Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[10] David Edwards,et al. Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests , 2010, BMC Bioinformatics.