Selecting informative genes for discriminant analysis using multigene expression profiles

BackgroundGene expression data extracted from microarray experiments have been used to study the difference between mRNA abundance of genes under different conditions. In one of such experiments, thousands of genes are measured simultaneously, which provides a high-dimensional feature space for discriminating between different sample classes. However, most of these dimensions are not informative about the between-class difference, and add noises to the discriminant analysis.ResultsIn this paper we propose and study feature selection methods that evaluate the "informativeness" of a set of genes. Two measures of information based on multigene expression profiles are considered for a backward information-driven screening approach for selecting important gene features. By considering multigene expression profiles, we are able to utilize interaction information among these genes. Using a breast cancer data, we illustrate our methods and compare them to the performance of existing methods.ConclusionWe illustrate in this paper that methods considering gene-gene interactions have better classification power in gene expression analysis. In our results, we identify important genes with relative large p-values from single gene tests. This indicates that these are genes with weak marginal information but strong interaction information, which will be overlooked by strategies that only examine individual genes.

[1]  X. Cui,et al.  Statistical tests for differential expression in cDNA microarray experiments , 2003, Genome Biology.

[2]  C. Li,et al.  Analyzing high‐density oligonucleotide gene expression array data , 2001, Journal of cellular biochemistry.

[3]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[4]  Tian Zheng,et al.  Backward Haplotype Transmission Association (BHTA) Algorithm – A Fast Multiple-Marker Screening Method , 2002, Human Heredity.

[5]  謙太郎 野間口,et al.  仮説に制約条件がある場合の Bivariate Sign Test , 1986 .

[6]  Yiming Yang,et al.  Analysis of recursive gene selection approaches from microarray data , 2005, Bioinform..

[7]  D. Stoyan,et al.  Stochastic Geometry and Its Applications , 1989 .

[8]  Johan A. K. Suykens,et al.  Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction , 2004, Bioinform..

[9]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[10]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  A. Mood,et al.  The statistical sign test. , 1946, Journal of the American Statistical Association.

[12]  T. Mattfeldt Stochastic Geometry and Its Applications , 1996 .

[13]  C. Li,et al.  Feature extraction and normalization algorithms for high‐density oligonucleotide gene expression array data , 2001, Journal of cellular biochemistry. Supplement.

[14]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[15]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[16]  P. Dixon Testing spatial segregation using a nearest-neighbor contingency table , 1994 .

[17]  R. Tibshirani,et al.  Statistical Applications in Genetics and Molecular Biology Pre-validation and inference in microarrays , 2011 .

[18]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[19]  Tian Zheng,et al.  Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs , 2006, Human Heredity.

[20]  B. Ripley Tests of 'Randomness' for Spatial Point Patterns , 1979 .

[21]  Xin Yan,et al.  Discriminant Analysis Using Multigene Expression Profiles in Classification of Breast Cancer , 2007, BIOCOMP.