sbv IMPROVER Diagnostic Signature Challenge

The sbv IMPROVER (systems biology verification—Industrial Methodology for Process Verification in Research) process aims to help companies verify component steps or tasks in larger research workflows for industrial applications. IMPROVER is built on challenges posed to the community that draws on the wisdom of crowds to assess the most suitable methods for a given research task. The Diagnostic Signature Challenge, open to the public from Mar. 5 to Jun. 21, 2012, was the first instantiation of the IMPROVER methodology and evaluated a fundamental biological question, specifically, if there is sufficient information in gene expression data to diagnose diseases. Fifty-four teams used publically available data to develop prediction models in four disease areas: multiple sclerosis, lung cancer, psoriasis, and chronic obstructive pulmonary disease. The predictions were scored against unpublished, blinded data provided by the organizers, and the results, including methods of the top performers, presented at a conference in Boston on Oct. 2–3, 2012. This paper offers an overview of the Diagnostic Signature Challenge and the accompanying symposium, and is the first article in a special issue of Systems Biomedicine, providing focused reviews of the submitted methods and general conclusions from the challenge. Overall, it was observed that optimal method choice and performance appeared largely dependent on endpoint, and results indicate the psoriasis and lung cancer subtypes sub-challenges were more accurately predicted, while the remaining classification tasks were much more challenging. Though no one approach was superior for every sub-challenge, there were methods, like linear discriminant analysis, that were found to perform consistently well in all.

[1]  D. Ransohoff Proteomics research to discover markers: what can we learn from Netflix? , 2010, Clinical chemistry.

[2]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Sui Huang Gene expression profiling, genetic networks, and cellular states: an integrating concept for tumorigenesis and drug discovery , 1999, Journal of Molecular Medicine.

[4]  A. Nair,et al.  Revisions to the TNM staging of non-small cell lung cancer: rationale, clinicoradiologic implications, and persistent limitations. , 2011, Radiographics : a review publication of the Radiological Society of North America, Inc.

[5]  Heinz Koeppl,et al.  Learning diagnostic signatures from microarray data using L1-regularized logistic regression , 2013 .

[6]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  Cecilio J. Vidal,et al.  Post-Translational Modifications in Health and Disease , 2011 .

[9]  J. Ioannidis Microarrays and molecular research: noise discovery? , 2005, The Lancet.

[10]  D. DeMets,et al.  Biomarkers and surrogate endpoints: Preferred definitions and conceptual framework , 2001, Clinical pharmacology and therapeutics.

[11]  R. Norel,et al.  The self-assessment trap: can we all be better than average? , 2011, Molecular systems biology.

[12]  Kai Wang,et al.  Kernel-based method for feature selection and disease diagnosis using transcriptomics data , 2013 .

[13]  Zhiyong Lu,et al.  Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases , 2011 .

[14]  Maqc Consortium The MicroArray Quality Control ( MAQC )-II study of common practices for the development and validation of microarray-based predictive models , 2012 .

[15]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[16]  Marja Talikka,et al.  Classification of lung adenocarcinoma and squamous cell carcinoma samples based on their gene expression profile in the sbv IMPROVER Diagnostic Signature Challenge , 2013 .

[17]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[18]  Gustavo Stolovitzky,et al.  Lessons from the DREAM2 Challenges , 2009, Annals of the New York Academy of Sciences.

[19]  Burkhard Rost,et al.  Evaluation of template‐based models in CASP8 with standard measures , 2009, Proteins.

[20]  Torsten Schwede,et al.  Assessment of template based protein structure predictions in CASP9 , 2011, Proteins.

[21]  Mario Lauria,et al.  Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge , 2013, Bioinform..

[22]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[23]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[24]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[25]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[26]  J. Janin Assessing predictions of protein–protein interaction: The CAPRI experiment , 2005, Protein science : a publication of the Protein Society.

[27]  M. Esteller,et al.  Epigenetic modifications and human disease , 2010, Nature Biotechnology.

[28]  Adi L. Tarca,et al.  Methodological approach from the Best Overall Team in the sbv IMPROVER Diagnostic Signature Challenge , 2013 .

[29]  A. Joe,et al.  Mechanisms of Disease: oncogene addiction—a rationale for molecular targeting in cancer therapy , 2006, Nature Clinical Practice Oncology.

[30]  S. Horvath,et al.  Predicting COPD status with a random generalized linear model , 2013 .

[31]  Mario Lauria,et al.  Rank-based transcriptional signatures , 2013 .

[32]  A Valencia,et al.  An Overview of BioCreative II.5 , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  J C Costello,et al.  Seeking the Wisdom of Crowds Through Challenge‐Based Competitions in Biomedical Research , 2013, Clinical pharmacology and therapeutics.

[34]  E. Diamandis,et al.  Cancer biomarkers: can we turn recent failures into success? , 2010, Journal of the National Cancer Institute.

[35]  E. Wager,et al.  Who is responsible for investigating suspected research misconduct? , 2012, Anaesthesia.

[36]  E. Ruppin,et al.  Global map of physical interactions among differentially expressed genes in multiple sclerosis relapses and remissions. , 2011, Human molecular genetics.

[37]  Quaid Morris,et al.  Relapsing-remitting multiple sclerosis classification using elastic net logistic regression on gene expression data , 2013 .

[38]  Merrill Goozner,et al.  Duke scandal highlights need for genomics research criteria. , 2011, Journal of the National Cancer Institute.

[39]  Darrell R Abernethy,et al.  Systems pharmacology to predict drug toxicity: integration across levels of biological organization. , 2013, Annual review of pharmacology and toxicology.

[40]  Ajay K. Royyuru,et al.  Industrial methodology for process verification in research (IMPROVER): toward systems biology verification , 2012, Bioinform..

[41]  Richard Van Noorden Science publishing: The trouble with retractions , 2011, Nature.

[42]  M. de Jong,et al.  Of Mice and Humans: Are They the Same?—Implications in Cancer Translational Research , 2010, Journal of Nuclear Medicine.

[43]  M. Peitsch,et al.  Verification of systems biology research in the age of collaborative competition , 2011, Nature Biotechnology.