Regularized gene selection in cancer microarray meta-analysis

BackgroundIn cancer studies, it is common that multiple microarray experiments are conducted to measure the same clinical outcome and expressions of the same set of genes. An important goal of such experiments is to identify a subset of genes that can potentially serve as predictive markers for cancer development and progression. Analyses of individual experiments may lead to unreliable gene selection results because of the small sample sizes. Meta analysis can be used to pool multiple experiments, increase statistical power, and achieve more reliable gene selection. The meta analysis of cancer microarray data is challenging because of the high dimensionality of gene expressions and the differences in experimental settings amongst different experiments.ResultsWe propose a Meta Threshold Gradient Descent Regularization (MTGDR) approach for gene selection in the meta analysis of cancer microarray data. The MTGDR has many advantages over existing approaches. It allows different experiments to have different experimental settings. It can account for the joint effects of multiple genes on cancer, and it can select the same set of cancer-associated genes across multiple experiments. Simulation studies and analyses of multiple pancreatic and liver cancer experiments demonstrate the superior performance of the MTGDR.ConclusionThe MTGDR provides an effective way of analyzing multiple cancer microarray studies and selecting reliable cancer-associated genes.

[1]  M. W. Büchler,et al.  Microarray-based identification of differentially expressed growth- and metastasis-associated genes in pancreatic cancer , 2003, Cellular and Molecular Life Sciences CMLS.

[2]  Gary W Barone,et al.  Differential expression of insulin‐like growth factor binding protein‐5 in pancreatic adenocarcinomas: Identification using DNA microarray , 2006, Molecular carcinogenesis.

[3]  Kevin R. Coombes,et al.  Differences in gene expression between B-cell chronic lymphocytic leukemia and normal B cells: a meta-analysis of three microarray studies , 2004, Bioinform..

[4]  Michael Clarke,et al.  Non-random Reflections on Health Services Research , 1998, BMJ.

[5]  Dong Wan Shin,et al.  Identifying differentially expressed genes in meta-analysis via Bayesian model-based clustering. , 2006, Biometrical journal. Biometrische Zeitschrift.

[6]  Benny Y. M. Fung,et al.  Meta-classification of Multi-type Cancer Gene Expression Data , 2004, BIOKDD.

[7]  Jun Chen,et al.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes , 2004, BMC Bioinformatics.

[8]  Lucila Ohno-Machado,et al.  Analysis of matched mRNA measurements from two different microarray technologies , 2002, Bioinform..

[9]  Jian Huang,et al.  Penalized feature selection and classification in bioinformatics , 2008, Briefings Bioinform..

[10]  Fiona Campbell,et al.  Proteomic analysis of chronic pancreatitis and pancreatic adenocarcinoma. , 2005, Gastroenterology.

[11]  John R. Stevens,et al.  Meta-Analysis Combines Affymetrix Microarray Results Across Laboratories , 2005, Comparative and functional genomics.

[12]  Anna Liu,et al.  Bayesian meta-analysis models for microarray data: a comparative study , 2007, BMC Bioinformatics.

[13]  David D. Smith,et al.  Meta-analysis of breast cancer microarray studies in conjunction with conserved cis-elements suggest patterns for coordinate regulation , 2008, BMC Bioinformatics.

[14]  Christine A Iacobuzio-Donahue,et al.  Highly expressed genes in pancreatic ductal adenocarcinomas: a comprehensive characterization and comparison of the transcription profiles obtained from three major technologies. , 2003, Cancer research.

[15]  Bogdan E. Popescu,et al.  Gradient Directed Regularization , 2004 .

[16]  Rork Kuick,et al.  Molecular profiling of pancreatic adenocarcinoma and chronic pancreatitis identifies multiple genes differentially regulated in pancreatic cancer. , 2003, Cancer research.

[17]  Jian Huang,et al.  Regularized ROC method for disease classification and biomarker selection with microarray data , 2005, Bioinform..

[18]  Tatjana Crnogorac-Jurcevic,et al.  Gene expression profiles of pancreatic cancer and stromal desmoplasia , 2001, Oncogene.

[19]  Andrew B. Nobel,et al.  Merging two gene-expression studies via cross-platform normalization , 2008, Bioinform..

[20]  Michael L. Bittner,et al.  Strong Feature Sets from Small Samples , 2002, J. Comput. Biol..

[21]  Debashis Ghosh,et al.  Classification and Selection of Biomarkers in Genomic Data Using LASSO , 2005, Journal of biomedicine & biotechnology.

[22]  Jiang Gui,et al.  Threshold Gradient Descent Method for Censored Data Regression with Applications in Pharmacogenomics , 2004, Pacific Symposium on Biocomputing.

[23]  Christian Pilarsky,et al.  Meta-analysis of microarray data on pancreatic cancer defines a set of commonly dysregulated genes , 2005, Oncogene.

[24]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[25]  Sangsoo Kim,et al.  Integrative analysis of multiple gene expression profiles applied to liver cancer study , 2004, FEBS letters.

[26]  Roland Eils,et al.  Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes , 2005, BMC Bioinformatics.

[27]  Debashis Ghosh,et al.  Statistical issues and methods for meta-analysis of microarray data: a case study in prostate cancer , 2003, Functional & Integrative Genomics.

[28]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[29]  Rainer Breitling,et al.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis , 2006, Bioinform..

[30]  Eithne Costello,et al.  Molecular alterations in pancreatic carcinoma: expression profiling shows that dysregulated expression of S100 genes is highly prevalent , 2003, The Journal of pathology.

[31]  Jian Huang,et al.  Clustering threshold gradient descent regularization: with applications to microarray studies , 2007, Bioinform..

[32]  Wendy Frankel,et al.  Fibrinogen gamma overexpression in pancreatic cancer identified by large-scale proteomic analysis of serum samples. , 2006, Cancer research.