Feature‐Specific Penalized Latent Class Analysis for Genomic Data

Genomic data are often characterized by a moderate to large number of categorical variables observed for relatively few subjects. Some of the variables may be missing or noninformative. An example of such data is loss of heterozygosity (LOH), a dichotomous variable, observed on a moderate number of genetic markers. We first consider a latent class model where, conditional on unobserved membership in one of k classes, the variables are independent with probabilities determined by a regression model of low dimension q. Using a family of penalties including the ridge and LASSO, we extend this model to address higher-dimensional problems. Finally, we present an orthogonal map that transforms marker space to a space of "features" for which the constrained model has better predictive power. We demonstrate these methods on LOH data collected at 19 markers from 93 brain tumor patients. For this data set, the existing unpenalized latent class methodology does not produce estimates. Additionally, we show that posterior classes obtained from this method are associated with survival for these patients.

[1]  H. Akaike A new look at the statistical model identification , 1974 .

[2]  R A Betensky,et al.  Molecular subtypes of anaplastic oligodendroglioma: implications for patient management at diagnosis. , 2001, Clinical cancer research : an official journal of the American Association for Cancer Research.

[3]  B. Lindsay,et al.  Semiparametric Estimation in the Rasch Model and Related Exponential Response Models, Including a Simple Latent Class Model for Item Analysis , 1991 .

[4]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[5]  Donald Goldfarb,et al.  A numerically stable dual method for solving strictly convex quadratic programs , 1983, Math. Program..

[6]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[7]  Scott L. Zeger,et al.  Latent Variable Regression for Multiple Discrete Outcomes , 1997 .

[8]  D. Louis,et al.  Specific genetic predictors of chemotherapeutic response and survival in patients with anaplastic oligodendrogliomas. , 1998, Journal of the National Cancer Institute.

[9]  H. Ng,et al.  Identification of two contiguous minimally deleted regions on chromosome 1p36.31–p36.32 in oligodendroglial tumours , 2004, British Journal of Cancer.

[10]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[11]  M. Wand,et al.  Semiparametric Regression: Parametric Regression , 2003 .

[12]  Peter G. M. van der Heijden,et al.  The EM algorithm for latent class analysis with equality constraints , 1992 .

[13]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[14]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[15]  Iven Van Mechelen,et al.  Constrained Latent Class Analysis of Three-Way Three-Mode Data , 2002, J. Classif..

[16]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[17]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[18]  Marina Vannucci,et al.  Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis , 2003 .

[19]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[20]  N. Sugiura Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[21]  M. Tan,et al.  Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. , 1996, Biometrics.

[22]  Paul F. Lazarsfeld,et al.  Latent Structure Analysis. , 1969 .

[23]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[26]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[27]  H. Hoijtink Constrained Latent Class Analysis Using the Gibbs Sampler and Posterior Predictive P-values: Applications to Educational Testing , 1998 .

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  A Agresti,et al.  Quasi-symmetric latent class models, with application to rater agreement. , 1993, Biometrics.

[30]  G. Taylor,et al.  Rapid detection of allele loss in colorectal tumours using microsatellites and fluorescent DNA technology. , 1993, British Journal of Cancer.

[31]  Clifford M. Hurvich,et al.  Model selection for extended quasi-likelihood models in small samples. , 1995, Biometrics.

[32]  M. Stephens Dealing with label switching in mixture models , 2000 .

[33]  Neil Henry Latent structure analysis , 1969 .

[34]  G. Schwarz Estimating the Dimension of a Model , 1978 .