Bmc Medical Genomics the Removal of Multiplicative, Systematic Bias Allows Integration of Breast Cancer Gene Expression Datasets – Improving Meta-analysis and Prediction of Prognosis

Background: The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses.

[1]  A. Regev,et al.  An embryonic stem cell–like gene expression signature in poorly differentiated aggressive human tumors , 2008, Nature Genetics.

[2]  C. Perou,et al.  Epidemiology of basal-like breast cancer , 2008, Breast Cancer Research and Treatment.

[3]  H. Dressman,et al.  Gene expression signatures, clinicopathological features, and individualized therapy in breast cancer. , 2008, JAMA.

[4]  Dechang Chen,et al.  Integrated analysis of independent gene expression microarray datasets improves the predictability of breast cancer outcome , 2007, BMC Genomics.

[5]  Stefania Tommasi,et al.  Aging impacts transcriptomes but not genomes of hormone-dependent breast cancers , 2007, Breast Cancer Research.

[6]  Claire L. Wilson,et al.  The utility of MAS5 expression summary and detection call algorithms , 2007, BMC Bioinformatics.

[7]  J. Bergh,et al.  Strong Time Dependence of the 76-Gene Prognostic Signature for Node-Negative Breast Cancer Patients in the TRANSBIG Multicenter Independent Validation Series , 2007, Clinical Cancer Research.

[8]  H. Ishwaran,et al.  Lung metastasis genes couple breast tumor size and metastatic spread , 2007, Proceedings of the National Academy of Sciences.

[9]  J. Bergh,et al.  Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. , 2007, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[10]  M. García-Closas,et al.  Differences in Risk Factors for Breast Cancer Molecular Subtypes in a Population-Based Study , 2007, Cancer Epidemiology Biomarkers & Prevention.

[11]  Crispin J. Miller,et al.  High correspondence between Affymetrix exon and standard expression arrays. , 2007, BioTechniques.

[12]  G. Sherlock,et al.  The prognostic role of a gene signature from tumorigenic breast-cancer cells. , 2007, The New England journal of medicine.

[13]  Zena Werb,et al.  GATA-3 Maintains the Differentiation of the Luminal Cell Fate in the Mammary Gland , 2006, Cell.

[14]  Joshy George,et al.  Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. , 2006, Cancer research.

[15]  Robert Tibshirani,et al.  Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene‐expression subtypes of breast cancer , 2006, Genes, chromosomes & cancer.

[16]  C. Ambrosone,et al.  Breast cancer in African-American women: differences in tumor biology from European-American women. , 2006, Cancer research.

[17]  A. Nobel,et al.  Concordance among Gene-Expression – Based Predictors for Breast Cancer , 2011 .

[18]  Yudi Pawitan,et al.  Intrinsic molecular signature of breast cancer in a population-based cohort of 412 patients , 2006, Breast Cancer Research.

[19]  C. Perou,et al.  Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. , 2006, JAMA.

[20]  T. Sørlie,et al.  Distinct molecular mechanisms underlying clinically relevant subtypes of breast cancer: gene expression analyses across three different platforms , 2006, BMC Genomics.

[21]  A. Nobel,et al.  The molecular portraits of breast tumors are conserved across microarray platforms , 2006, BMC Genomics.

[22]  Howard Y. Chang,et al.  Genetic regulators of large-scale transcriptional signatures in cancer , 2006, Nature Genetics.

[23]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[24]  Shridar Ganesan,et al.  X chromosomal abnormalities in basal-like human breast cancer. , 2006, Cancer cell.

[25]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[26]  Crispin J. Miller,et al.  Simpleaffy: a BioConductor package for Affymetrix Quality Control and data analysis , 2005, Bioinform..

[27]  VN Kristensen Predicting response/resistance to endocrine therapy for breast cancer , 2005, Breast Cancer Research.

[28]  Howard Y. Chang,et al.  Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[30]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[31]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[32]  Crispin J. Miller,et al.  Amplification protocols introduce systematic but reproducible errors into gene expression studies. , 2004, BioTechniques.

[33]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[35]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[37]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[38]  Helen Parkinson,et al.  Data storage and analysis in ArrayExpress. , 2006, Methods in enzymology.

[39]  Joel S. Parker,et al.  Adjustment of systematic microarray data biases , 2004, Bioinform..

[40]  Guoying Liu,et al.  NetAffx: Affymetrix probesets and annotations , 2003, Nucleic Acids Res..