GALGO: an R package for multivariate variable selection using genetic algorithms

SUMMARY The development of statistical models linking the molecular state of a cell to its physiology is one of the most important tasks in the analysis of Functional Genomics data. Because of the large number of variables measured a comprehensive evaluation of variable subsets cannot be performed with available computational resources. It follows that an efficient variable selection strategy is required. However, although software packages for performing univariate variable selection are available, a comprehensive software environment to develop and evaluate multivariate statistical models using a multivariate variable selection strategy is still needed. In order to address this issue, we developed GALGO, an R package based on a genetic algorithm variable selection strategy, primarily designed to develop statistical models from large-scale datasets.

[1]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[3]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[4]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5]  Joaquín Dopazo,et al.  GEPAS, an experiment-oriented pipeline for the analysis of microarray gene expression data , 2005, Nucleic Acids Res..

[6]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[7]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[8]  M. Hurme,et al.  Interleukin-1 beta, interleukin-1 receptor antagonist and interleukin-6 plasma levels and cytokine gene polymorphisms in chronic lymphocytic leukemia: correlation with prognostic parameters. , 2000, Haematologica.

[9]  Jae Won Lee,et al.  An extensive comparison of recent classification tools applied to microarray data , 2004, Comput. Stat. Data Anal..

[10]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[11]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[12]  Marina Vannucci,et al.  Bayesian Variable Selection in Multinomial Probit Models to Identify Molecular Signatures of Disease Stage , 2004, Biometrics.

[13]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[14]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[15]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[16]  Motohiro Tsuzuki,et al.  Interleukin-1β (IL-1β) and acute leukemia: In vitro proliferative response to IL-1β, IL-1β content of leukemic cells and treatment outcome , 1995 .

[17]  Sandrine Dudoit,et al.  Classification in microarray experiments , 2003 .