Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology

High-throughput biological assays such as microarrays let us ask very detailed questions about how diseases operate, and promise to let us personalize therapy. Data processing, however, is often not described well enough to allow for exact reproduction of the results, leading to exercises in “forensic bioinformatics” where aspects of raw data and reported results are used to infer what methods must have been employed. Unfortunately, poor documentation can shift from an inconvenience to an active danger when it obscures not just methods but errors. In this report, we examine several related papers purporting to use microarray-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials are currently being allocated to treatment arms on the basis of these results. However, we show in five case studies that the results incorporate several simple errors that may be putting patients at risk. One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common. We then discuss steps we are taking to avoid such errors in our own investigations.

[1]  Friedrich Leisch,et al.  Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis , 2002, COMPSTAT.

[2]  Syed Mohsin,et al.  Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer , 2003, The Lancet.

[3]  Jeffrey S. Morris,et al.  Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments , 2004, Bioinform..

[4]  Wolfgang Huber,et al.  A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks , 2004, Statistical applications in genetics and molecular biology.

[5]  Kevin R. Coombes,et al.  Organ-Specific Differences in Gene Expression and Unigene Annotations Describing Source Material , 2004 .

[6]  Cheng Cheng,et al.  Gene-expression patterns in drug-resistant acute lymphoblastic leukemia cells and response to treatment. , 2004, The New England journal of medicine.

[7]  Sarah R. Edmonson,et al.  High-resolution serum proteomic patterns for ovarian cancer detection. , 2004, Endocrine-related cancer.

[8]  L. Liotta,et al.  High-resolution serum proteomic patterns for ovarian cancer detection , 2004 .

[9]  Jeffrey S. Morris,et al.  Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer. , 2005, Journal of the National Cancer Institute.

[10]  Ting Chen,et al.  An integrated approach to the prediction of domain-domain interactions , 2006, BMC Bioinformatics.

[11]  M. Relling,et al.  Identification of genes associated with chemotherapy crossresistance and treatment response in childhood acute lymphoblastic leukemia. , 2005, Cancer cell.

[12]  W. Sauerbrei,et al.  Reporting recommendations for tumor marker prognostic studies (REMARK). , 2005, Journal of the National Cancer Institute.

[13]  Robert Gentleman,et al.  Reproducible Research: A Bioinformatics Case Study , 2005, Statistical applications in genetics and molecular biology.

[14]  H. Dressman,et al.  Genomic signatures to guide the use of chemotherapeutics , 2006, Nature Medicine.

[15]  C. Denkert,et al.  Gene expression profiling of 30 cancer cell lines predicts resistance towards 11 anticancer drugs at clinically achieved concentrations , 2006, International journal of cancer.

[16]  Robert Gentleman,et al.  Statistical Analyses and Reproducible Research , 2007 .

[17]  H. Dressman,et al.  Pharmacogenomic strategies provide a rational approach to the treatment of cisplatin-resistant patients with advanced cancer. , 2007, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[18]  K. Coombes,et al.  Microarrays: retracing steps , 2007, Nature Medicine.

[19]  J. Nevins,et al.  Reply to 'Microarrays: retracing steps' , 2007, Nature Medicine.

[20]  J. Nevins,et al.  Validation of gene signatures that predict the response of breast cancer to neoadjuvant chemotherapy: a substudy of the EORTC 10994/BIG 00-01 clinical trial. , 2007, The Lancet. Oncology.

[21]  Kevin R Coombes,et al.  Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[22]  Anil Potti,et al.  A genomic approach to identify molecular pathways associated with chemotherapy resistance , 2008, Molecular Cancer Therapeutics.

[23]  Anil Potti,et al.  An Integrated Approach to the Prediction of Chemotherapeutic Response in Patients with Breast Cancer , 2008, PloS one.

[24]  Cheng Li,et al.  Automating dChip: toward reproducible sharing of microarray data analysis , 2008, BMC Bioinformatics.

[25]  H. Dressman,et al.  Corrigendum: Genomic signatures to guide the use of chemotherapeutics , 2008 .

[26]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[27]  J. Nevins,et al.  Genomic and Molecular Profiling Predicts Response to Temozolomide in Melanoma , 2009, Clinical Cancer Research.