Similarity of markers identified from cancer gene expression studies: observations from GEO

Gene expression profiling has been extensively conducted in cancer research. The analysis of multiple independent cancer gene expression datasets may provide additional information and complement single-dataset analysis. In this study, we conduct multi-dataset analysis and are interested in evaluating the similarity of cancer-associated genes identified from different datasets. The first objective of this study is to briefly review some statistical methods that can be used for such evaluation. Both marginal analysis and joint analysis methods are reviewed. The second objective is to apply those methods to 26 Gene Expression Omnibus (GEO) datasets on five types of cancers. Our analysis suggests that for the same cancer, the marker identification results may vary significantly across datasets, and different datasets share few common genes. In addition, datasets on different cancers share few common genes. The shared genetic basis of datasets on the same or different cancers, which has been suggested in the literature, is not observed in the analysis of GEO data.

[1]  Tim Beißbarth,et al.  TGF-b signaling , 2022 .

[2]  小森和樹 Gene Expression Omnibus利用方法の検討 , 2016 .

[3]  Shridar Ganesan,et al.  X chromosomal abnormalities in basal-like human breast cancer. , 2006, Cancer cell.

[4]  A. Malek,et al.  ETS Transcription Factors Control Transcription of EZH2 and Epigenetic Silencing of the Tumor Suppressor Gene Nkx3.1 in Prostate Cancer , 2010, PloS one.

[5]  J. Castle,et al.  Comparative expression pathway analysis of human and canine mammary tumors , 2009, BMC Genomics.

[6]  An-Qin Zhang,et al.  Screening of significantly hypermethylated genes in breast cancer using microarray-based methylated-CpG island recovery assay and identification of their expression levels. , 2012, International journal of oncology.

[7]  Jian Huang,et al.  BMC Bioinformatics BioMed Central Methodology article Supervised group Lasso with applications to microarray data , 2007 .

[8]  J. Leek Surrogate variable analysis , 2007 .

[9]  Jialiang Li,et al.  Adjusting confounders in ranking biomarkers: a model-based ROC approach , 2012, Briefings Bioinform..

[10]  Shuangge Ma BMC Bioinformatics BioMed Central Methodology article Empirical study of supervised gene screening , 2006 .

[11]  Donald Geman,et al.  Large-scale integration of cancer microarray data identifies a robust common cancer signature , 2007, BMC Bioinformatics.

[12]  Ituro Inoue,et al.  Gene expression profiling of advanced‐stage serous ovarian cancers distinguishes novel subclasses and implicates ZEB2 in tumor progression and prognosis , 2009, Cancer science.

[13]  Krishna R. Kalari,et al.  FKBP51 affects cancer cell response to chemotherapy by negatively regulating Akt. , 2009, Cancer cell.

[14]  K. Goh,et al.  Exploring the human diseasome: the human disease network. , 2012, Briefings in functional genomics.

[15]  Antonio de las Morenas,et al.  Gene Expression Profiles of Estrogen Receptor–Positive and Estrogen Receptor–Negative Breast Cancers Are Detectable in Histologically Normal Breast Epithelium , 2010, Clinical Cancer Research.

[16]  Daniel Brewer,et al.  Integration of ERG gene mapping and gene‐expression profiling identifies distinct categories of human prostate cancer , 2009, BJU international.

[17]  Sean Davis,et al.  Molecular Grading of Ductal Carcinoma In situ of the Breast , 2008, Clinical Cancer Research.

[18]  Steen Knudsen Cancer Diagnostics with DNA Microarrays: Knudsen/Cancer Diagnostics with DNA Microarrays , 2006 .

[19]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[20]  Hongmin Li,et al.  A Precisely Regulated Gene Expression Cassette Potently Modulates Metastasis and Survival in Multiple Solid Cancers , 2008, PLoS genetics.

[21]  G. Migliardi,et al.  Epidermal Growth Factor Receptor (EGFR) mutation analysis, gene expression profiling and EGFR protein expression in primary prostate cancer , 2011, BMC Cancer.

[22]  Jian Huang,et al.  Identification of genes associated with multiple cancers via integrative analysis , 2009, BMC Genomics.

[23]  W. Wong,et al.  A gene signature predictive for outcome in advanced ovarian cancer identifies a survival factor: microfibril-associated glycoprotein 2. , 2009, Cancer cell.

[24]  Jian Huang,et al.  Integrative analysis of multiple cancer prognosis studies with gene expression measurements , 2011, Statistics in medicine.

[25]  Jian Huang,et al.  Identification of cancer genomic markers via integrative sparse boosting. , 2012, Biostatistics.

[26]  J. Mackey,et al.  Association of FABP5 expression with poor survival in triple-negative breast cancer: implication for retinoic acid therapy. , 2011, The American journal of pathology.

[27]  Steen Knudsen Cancer Diagnostics with DNA Microarrays , 2006 .

[28]  A. Friedl,et al.  Heterogeneity of Gene Expression in Stromal Fibroblasts of Human Breast Carcinomas and Normal Breast , 2009, Oncogene.

[29]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[30]  Darlene R. Goldstein,et al.  Meta-analysis and Combining Information in Genetics and Genomics , 2009 .

[31]  Setsuo Hirohashi,et al.  CXCL17 and ICAM2 are associated with a potential anti-tumor immune response in early intraepithelial stages of human pancreatic carcinogenesis. , 2011, Gastroenterology.

[32]  Atul J. Butte,et al.  Autoimmune Disease Classification by Inverse Association with SNP Alleles , 2009, PLoS genetics.

[33]  Young Ah Goo,et al.  Gene expression down-regulation in CD90+ prostate tumor-associated stromal cells involves potential organ-specific genes , 2009, BMC Cancer.

[34]  Mustafa Ozen,et al.  Global Gene Expression Analysis of Reactive Stroma in Prostate Cancer , 2009, Clinical Cancer Research.

[35]  Rohaizak Muhammad,et al.  Gene expression patterns distinguish breast carcinomas from normal breast tissues: the Malaysian context. , 2010, Pathology, research and practice.

[36]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[37]  David E. Booth Cancer Diagnostics With DNA Microarrays , 2007, Technometrics.

[38]  Victoria Kristina Perry,et al.  Gene expression abnormalities in histologically normal breast epithelium of breast cancer patients , 2007, International journal of cancer.

[39]  B. Nan,et al.  Survival Analysis with High-Dimensional Covariates , 2010 .

[40]  Lilya V. Matyunina,et al.  Gene expression profiling supports the hypothesis that human ovarian surface epithelia are multipotent and capable of serving as ovarian cancer initiating cells , 2009, BMC Medical Genomics.

[41]  Jian Huang,et al.  A Selective Review of Group Selection in High-Dimensional Models. , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[42]  Sundarraman Subramanian,et al.  Recent Advances in Biostatistics: False Discovery Rates, Survival Analysis, and Related Topics , 2011 .

[43]  Phillip Stafford,et al.  Methods in Microarray Normalization , 2008 .

[44]  Jian Huang,et al.  Integrative Analysis of Cancer Prognosis Data With Multiple Subtypes Using Regularized Gradient Descent , 2012, Genetic epidemiology.

[45]  L. Hawthorn,et al.  Integration of transcript expression, copy number and LOH analysis of infiltrating ductal carcinoma of the breast , 2010, BMC Cancer.

[46]  Jian Huang,et al.  Penalized feature selection and classification in bioinformatics , 2008, Briefings Bioinform..

[47]  Xiao-Jun Ma,et al.  Gene expression profiling of the tumor microenvironment during breast cancer progression , 2009, Breast Cancer Research.