Sources of error and its control in studies on the diagnostic accuracy of “‐omics” technologies

Analyses of errors in diagnostic studies have led to improvements in the methodological quality of traditional laboratory research. However, since features of genomics and proteomics research (“‐omics”) differ from those of traditional research, sources of error are also likely to be distinct. We examine the main sources of error that are particularly relevant to “‐omics”‐based diagnostic techniques through the analysis of primary research papers which address these potential errors, their solutions, and the resulting spurious effect on diagnostic accuracy prediction. The main sources of error described in “‐omics”‐based research are mainly associated with chance: overfitting and inadequate sample size; variation: preanalytical variation (specimen collection and management), analytical variation (test procedures and reproducibility) and biological variation. We conclude that “‐omics”‐based research is prone to several errors. We have characterized them and shown the range of available solutions. This is a key step in the application of genomic discoveries to clinical and public health practice.

[1]  M. Sundaralingam,et al.  Structlre of transfer RNA molecules containing the long variable loop. , 1976, Nucleic acids research.

[2]  A R Feinstein,et al.  Use of methodological standards in diagnostic test research. Getting better but still not good. , 1995, JAMA.

[3]  P. Bossuyt,et al.  Empirical evidence of design-related bias in studies of diagnostic tests. , 1999, JAMA.

[4]  E. Hoffman,et al.  Sources of variability and effect of experimental approach on expression profiling data interpretation , 2002, BMC Bioinformatics.

[5]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[6]  Tor-Kristian Jenssen,et al.  Analysis of repeatability in spotted cDNA microarrays. , 2002, Nucleic acids research.

[7]  Stephanie A. Mulherin,et al.  Spectrum Bias or Spectrum Effect? Subgroup Variation in Diagnostic Test Evaluation , 2002, Annals of Internal Medicine.

[8]  Marilyn J Aardema,et al.  Toxicology and genetic toxicology in the new era of "toxicogenomics": impact of "-omics" technologies. , 2002, Mutation research.

[9]  Michael L. Bittner,et al.  Strong Feature Sets from Small Samples , 2002, J. Comput. Biol..

[10]  I. Hernández-Aguado,et al.  The winding road towards evidence based diagnoses , 2002, Journal of epidemiology and community health.

[11]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  P. Schellhammer,et al.  Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. , 2002, Clinical chemistry.

[13]  Ronglai Shen,et al.  Changes in differential gene expression because of warm ischemia time of radical prostatectomy specimens. , 2002, The American journal of pathology.

[14]  David Moher,et al.  The STARD Statement for Reporting Studies of Diagnostic Accuracy: Explanation and Elaboration , 2003, Annals of Internal Medicine [serial online].

[15]  James Lyons-Weiler,et al.  Overcoming confounded controls in the analysis of gene expression data from microarray experiments. , 2003, Applied bioinformatics.

[16]  Cesare Furlanello,et al.  Entropy-based gene ranking without selection bias for the predictive classification of microarray data , 2003, BMC Bioinformatics.

[17]  Min Zhan,et al.  A data review and re-assessment of ovarian cancer serum proteomic profiling , 2003, BMC Bioinformatics.

[18]  P. Bossuyt,et al.  BMC Medical Research Methodology , 2002 .

[19]  Jeffrey S. Morris,et al.  A comprehensive approach to the analysis of matrix‐assisted laser desorption/ionization‐time of flight proteomics spectra from serum samples , 2003, Proteomics.

[20]  D. Ransohoff Rules of evidence for cancer molecular-marker discovery and validation , 2004, Nature Reviews Cancer.

[21]  Joel S. Parker,et al.  Adjustment of systematic microarray data biases , 2004, Bioinform..

[22]  Ildefonso Hernández-Aguado,et al.  Methodology in diagnostic laboratory test research in clinical chemistry and clinical chemistry and laboratory medicine. , 2004, Clinical chemistry.

[23]  Nada Lavrac,et al.  Induction of comprehensible models for gene expression datasets by subgroup discovery methodology , 2004, J. Biomed. Informatics.

[24]  Eliot Marshall,et al.  Getting the Noise Out of Gene Arrays , 2004, Science.

[25]  Petri Auvinen,et al.  Are data from different gene expression microarray platforms comparable? , 2004, Genomics.

[26]  E. Petricoin,et al.  High-resolution serum proteomic features for ovarian cancer detection. , 2004, Endocrine-related cancer.

[27]  Radka Stoyanova,et al.  A novel approach for increasing sensitivity and correcting saturation artifacts of radioactively labeled cDNA arrays , 2004, Bioinform..

[28]  Tianzi Jiang,et al.  A combinational feature selection and ensemble neural network method for classification of gene expression data , 2004, BMC Bioinformatics.

[29]  P. Bossuyt,et al.  Sources of Variation and Bias in Studies of Diagnostic Accuracy , 2004, Annals of Internal Medicine.

[30]  Jeffrey S. Morris,et al.  Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer. , 2005, Journal of the National Cancer Institute.

[31]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[32]  T. Veenstra,et al.  Analytical and preanalytical biases in serum proteomic pattern analysis for breast cancer diagnosis. , 2005, Clinical chemistry.

[33]  M. Neumaier,et al.  Preanalytical impact of sample handling on proteome profiling experiments with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. , 2005, Clinical chemistry.

[34]  J. Barrett,et al.  Influences of blood sample processing on low-molecular-weight proteome identified by surface-enhanced laser desorption/ionization mass spectrometry. , 2005, Clinical chemistry.

[35]  S. Baumann,et al.  Standardized approach to proteome profiling of human serum based on magnetic bead separation and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. , 2005, Clinical chemistry.

[36]  André M Deelder,et al.  Reliability of human serum protein profiles generated with C8 magnetic beads assisted MALDI-TOF mass spectrometry. , 2005, Analytical chemistry.

[37]  E. Marchiori,et al.  Sample handling for mass spectrometric proteomic investigations of human sera. , 2005, Analytical chemistry.

[38]  Graham B. I. Scott,et al.  HUPO Plasma Proteome Project specimen collection and handling: Towards the standardization of parameters for plasma proteome samples , 2005, Proteomics.

[39]  Kenneth H Buetow,et al.  Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. , 2005, Clinical cancer research : an official journal of the American Association for Cancer Research.

[40]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[41]  Karuturi R. Krishna Murthy,et al.  Bias in the estimation of false discovery rate in microarray studies , 2005, Bioinform..

[42]  K. Mirnics,et al.  Platform influence on DNA microarray data in postmortem brain research , 2005, Neurobiology of Disease.

[43]  M. Trosset,et al.  Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for serum peptides using time-series analysis techniques. , 2005, Clinical chemistry.

[44]  Hanna Göransson,et al.  Improved variance estimation of classification performance via reduction of bias caused by small sample size , 2006, BMC Bioinformatics.

[45]  P. Tempst,et al.  Correcting common errors in identifying cancer-specific serum peptide signatures. , 2005, Journal of proteome research.

[46]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[47]  D. Ransohoff Bias as a threat to the validity of cancer molecular-marker research , 2005, Nature reviews. Cancer.

[48]  Wenjiang J. Fu,et al.  Estimating misclassification error with small samples via bootstrap cross-validation , 2005, Bioinform..

[49]  D. Chan,et al.  Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. Assessment of platform reproducibility. , 2005, Clinical chemistry.

[50]  Members of the Complex Trait Consortium Standardizing global gene expression analysis between laboratories and across platforms , 2005 .

[51]  Youli Zu,et al.  Validation of tissue microarray immunohistochemistry staining and interpretation in diffuse large B-cell lymphoma , 2005, Leukemia & lymphoma.

[52]  Yudi Pawitan,et al.  False discovery rate, sensitivity and sample size for microarray studies , 2005, Bioinform..

[53]  P. Collins,et al.  Performance comparison of one-color and two-color platforms within the Microarray Quality Control (MAQC) project , 2006, Nature Biotechnology.

[54]  Zixiang Xiong,et al.  Noise-injected neural networks show promise for use on small-sample expression data , 2006, BMC Bioinformatics.

[55]  Bart J. A. Mertens,et al.  Mass Spectrometry Proteomic Diagnosis: Enacting the Double Cross-Validatory Paradigm , 2006, J. Comput. Biol..

[56]  P. Tempst,et al.  Serum Peptidome Patterns That Distinguish Metastatic Thyroid Carcinoma from Cancer-free Controls Are Unbiased by Gender and Age*S , 2006, Molecular & Cellular Proteomics.

[57]  Werner Dubitzky,et al.  Avoiding model selection bias in small-sample genomic datasets , 2006, Bioinform..

[58]  S. Gammeltoft,et al.  Preanalytical and analytical variation of surface-enhanced laser desorption-ionization time-of-flight mass spectrometry of human serum , 2006, Clinical chemistry and laboratory medicine.

[59]  Sen-Yung Hsieh,et al.  Systematical evaluation of the effects of sample collection procedures on low‐molecular‐weight serum/plasma proteome profiling , 2006, Proteomics.

[60]  Werner Zolg,et al.  The Proteomic Search for Diagnostic Biomarkers , 2006, Molecular & Cellular Proteomics.

[61]  Charles L. Wilkins,et al.  Problems with the “omics” , 2006 .

[62]  Wei Zhu,et al.  Feature extraction in the analysis of proteomic mass spectra , 2006, Proteomics.

[63]  Lajos Pusztai,et al.  Reproducibility of Gene Expression Signature–Based Predictions in Replicate Experiments , 2006, Clinical Cancer Research.

[64]  Alexander J. Hartemink,et al.  Finding Diagnostic Biomarkers in Proteomic Spectra , 2006, Pacific Symposium on Biocomputing.

[65]  Daniel W Lin,et al.  Influence of surgical manipulation on prostate gene expression: implications for molecular correlates of treatment effects and disease prognosis. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[66]  W. Wodzig,et al.  Standardization of calibration and quality control using surface enhanced laser desorption ionization-time of flight-mass spectrometry. , 2006, Clinica chimica acta; international journal of clinical chemistry.

[67]  Wei Sun,et al.  Proteomic analysis of individual variation in normal livers of human beings using difference gel electrophoresis , 2006, Proteomics.

[68]  Suzanne D Vernon,et al.  Laboratory methods to improve SELDI peak detection and quantitation , 2007, Proteome Science.

[69]  Weida Tong,et al.  Consensus analysis of multiple classifiers using non-repetitive variables: Diagnostic application to microarray gene expression data , 2007, Comput. Biol. Chem..

[70]  A. Dupuy,et al.  Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. , 2007, Journal of the National Cancer Institute.

[71]  J. G. Liao,et al.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[72]  J. Albrethsen Reproducibility in protein profiling by MALDI-TOF mass spectrometry. , 2007, Clinical chemistry.

[73]  John M Koomen,et al.  Pre-analytic saliva processing affect proteomic results and biomarker screening of head and neck squamous carcinoma. , 2007, International journal of oncology.

[74]  Tommy W. S. Chow,et al.  Effective Gene Selection Method With Small Sample Sets Using Gradient-Based and Point Injection Techniques , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[75]  P. Saratchandran,et al.  Multicategory Classification Using An Extreme Learning Machine for Microarray Gene Expression Cancer Diagnosis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[76]  Miquel Porta,et al.  "Omics" research, monetization of intellectual property and fragmentation of knowledge: can clinical epidemiology strengthen integrative research? , 2007, Journal of clinical epidemiology.

[77]  Lennart Martens,et al.  The minimum information about a proteomics experiment (MIAPE) , 2007, Nature Biotechnology.

[78]  Zhiyuan Luo,et al.  Preanalytic influence of sample handling on SELDI-TOF serum protein profiles. , 2007, Clinical chemistry.