Local likelihood regression in generalized linear single-index models with applications to microarray data

Searching for an effective dimension reduction space is an important problem in regression, especially for high-dimensional data such as microarray data. A major characteristic of microarray data consists in the small number of observations n and a very large number of genes p. This ''large p, small n'' paradigm makes the discriminant analysis for classification difficult. In order to offset this dimensionality problem a solution consists in reducing the dimension. Supervised classification is understood as a regression problem with a small number of observations and a large number of covariates. A new approach for dimension reduction is proposed. This is based on a semi-parametric approach which uses local likelihood estimates for single-index generalized linear models. The asymptotic properties of this procedure are considered and its asymptotic performances are illustrated by simulations. Applications of this method when applied to binary and multiclass classification of the three real data sets Colon, Leukemia and SRBCT are presented.

[1]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[2]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[3]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[4]  Jae Won Lee,et al.  An extensive comparison of recent classification tools applied to microarray data , 2004, Comput. Stat. Data Anal..

[5]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[6]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[7]  Tal Pupko,et al.  A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites: Application to the evolution of five gene families , 2002, Bioinform..

[8]  Emmanuel Lesaffre,et al.  Partial Separation in Logistic Discrimination , 1989 .

[9]  L. Fahrmeir,et al.  Multivariate statistical modelling based on generalized linear models , 1994 .

[10]  Thomas J. Santner,et al.  A note on A. Albert and J. A. Anderson's conditions for the existence of maximum likelihood estimates in logistic regression models , 1986 .

[11]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[12]  Jianqing Fan,et al.  Local polynomial kernel regression for generalized linear models and quasi-likelihood functions , 1995 .

[13]  J. Simonoff Smoothing Methods in Statistics , 1998 .

[14]  B. Marx Iteratively reweighted partial least squares estimation for generalized linear regression , 1996 .

[15]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[16]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[17]  D. Pollard Asymptotics for Least Absolute Deviation Regression Estimators , 1991, Econometric Theory.

[18]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[19]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[20]  John A. Nelder,et al.  Generalized linear models. 2nd ed. , 1993 .

[21]  Gersende Fort,et al.  Classification using partial least squares with penalized logistic regression , 2005, Bioinform..

[22]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[23]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[24]  Jianqing Fan,et al.  Generalized Partially Linear Single-Index Models , 1997 .

[25]  H. Tong,et al.  Article: 2 , 2002, European Financial Services Law.

[26]  R. Gentleman,et al.  Classification Using Generalized Partial Least Squares , 2005 .

[27]  Danh V. Nguyen,et al.  Multi-class cancer classification via partial least squares with gene expression profiles , 2002, Bioinform..

[28]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[29]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Sophie Lambert-Lacroix,et al.  Effective dimension reduction methods for tumor classification using gene expression data , 2003, Bioinform..

[31]  Theo Gasser,et al.  Finite-Sample Variance of Local Polynomials: Analysis and Solutions , 1996 .

[32]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .