论文信息 - Bayesian Regression Analysis in the "Large p, Small n" Paradigm with Application in DNA Microarray S

Bayesian Regression Analysis in the "Large p, Small n" Paradigm with Application in DNA Microarray S

Statistical modelling and inference problems in which sample sizes are substantially smaller than the number of available and potentially interesting predictors (explanatory variables) abound in applied science and medicine. These “Large p, Small n” problems pose challenges to standard statistical methods and demand new concepts and models for regression and classification. Our motivating applied context is in functional genomics; more specifically, in studies of phenotyping clinical or physiological outcomes in which the predictors are measured expression levels of large numbers of genes based on high-density DNA microarrays. In a canonical framework of binary regression, we discuss (a) issues of regression modelling utilising singular-value decompositions of design matrices that are massively rank deficient, (b) the imperatives for careful, informative prior specifications on high-dimension regression parameters, (c) the development of new classes of structured prior distributions for this problem, and (d) the development of appropriate computational methods and modes of posterior inference for regression estimation and predictive inference for out-of-sample classification. The latter enterprise is fundamental to genomic phenotyping applications. We study and exemplify the new statistical methodology in a problem of breast cancer phenotyping using DNA microarray expression profiles as predictors, and in discrimination of leukemia types.

[1] S. Chib,et al. Bayesian analysis of binary and polychotomous response data , 1993 .

[2] C. Li,et al. Analyzing high‐density oligonucleotide gene expression array data , 2001, Journal of cellular biochemistry.

[3] M. West,et al. Bayesian Dynamic Factor Models and Portfolio Allocation , 2000 .

[4] S. Dudoit,et al. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .