The sbv IMPROVER Systems Toxicology Computational Challenge: Identification of Human and Species-Independent Blood Response Markers as Predictors of Smoking Exposure and Cessation Status.

Cigarette smoking entails chronic exposure to a mixture of harmful chemicals that trigger molecular changes over time, and is known to increase the risk of developing diseases. Risk assessment in the context of 21st century toxicology relies on the elucidation of mechanisms of toxicity and the identification of exposure response markers, usually from high-throughput data, using advanced computational methodologies. The sbv IMPROVER Systems Toxicology computational challenge (Fall 2015-Spring 2016) aimed to evaluate whether robust and sparse (≤40 genes) human (sub-challenge 1, SC1) and species-independent (sub-challenge 2, SC2) exposure response markers (so called gene signatures) could be extracted from human and mouse blood transcriptomics data of current (S), former (FS) and never (NS) smoke-exposed subjects as predictors of smoking and cessation status. Best-performing computational methods were identified by scoring anonymized participants' predictions. Worldwide participation resulted in 12 (SC1) and six (SC2) final submissions qualified for scoring. The results showed that blood gene expression data were informative to predict smoking exposure (i.e. discriminating smoker versus never or former smokers) status in human and across species with a high level of accuracy. By contrast, the prediction of cessation status (i.e. distinguishing FS from NS) remained challenging, as reflected by lower classification performances. Participants successfully developed inductive predictive models and extracted human and species-independent gene signatures, including genes with high consensus across teams. Post-challenge analyses highlighted "feature selection" as a key step in the process of building a classifier and confirmed the importance of testing a gene signature in independent cohorts to ensure the generalized applicability of a predictive model at a population-based level. In conclusion, the Systems Toxicology challenge demonstrated the feasibility of extracting a consistent blood-based smoke exposure response gene signature and further stressed the importance of independent and unbiased data and method evaluations to provide confidence in systems toxicology-based scientific conclusions.

[1]  J. Zhang,et al.  Data mining reveals a network of early-response genes as a consensus signature of drug-induced in vitro and in vivo toxicity , 2013, The Pharmacogenomics Journal.

[2]  Zhen Zhang,et al.  An In Vitro Diagnostic Multivariate Index Assay (IVDMIA) for Ovarian Cancer: Harvesting the Power of Multiple Biomarkers. , 2012, Reviews in obstetrics & gynecology.

[3]  Erhan Bilal,et al.  Understanding the limits of animal models as predictors of human biology: lessons learned from the sbv IMPROVER Species Translation Challenge , 2014, Bioinform..

[4]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[5]  A. Donelli,et al.  Assessment of the reduction in levels of exposure to harmful and potentially harmful constituents in Japanese subjects using a novel tobacco heating system compared with conventional cigarettes and smoking abstinence: A randomized controlled study in confinement. , 2016, Regulatory toxicology and pharmacology : RTP.

[6]  Weiliang Qiu,et al.  Cigarette smoking behaviors and time since quitting are associated with differential DNA methylation across the human genome. , 2012, Human molecular genetics.

[7]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[8]  J. Nevins,et al.  Peripheral Blood Signatures of Lead Exposure , 2011, PloS one.

[9]  Karl J. Lackner,et al.  Graphical Modeling of Gene Expression in Monocytes Suggests Molecular Mechanisms Explaining Increased Atherosclerosis in Smokers , 2013, PloS one.

[10]  M. Peitsch,et al.  Crowd-Sourced Verification of Computational Methods and Data in Systems Toxicology: A Case Study with a Heat-Not-Burn Candidate Modified Risk Tobacco Product. , 2017, Chemical research in toxicology.

[11]  M. Peters,et al.  A whole-blood transcriptome meta-analysis identifies gene expression signatures of cigarette smoking. , 2016, Human molecular genetics.

[12]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Mario Lauria,et al.  Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge , 2013, Bioinform..

[14]  C. Sotiriou,et al.  Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures , 2007, Breast Cancer Research.

[15]  David Cameron,et al.  A stroma-related gene signature predicts resistance to neoadjuvant chemotherapy in breast cancer , 2009, Nature Medicine.

[16]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[17]  Rafael A Irizarry,et al.  Frozen robust multiarray analysis (fRMA). , 2010, Biostatistics.

[18]  R. Effros,et al.  Accelerated Aging in HIV/AIDS: Novel Biomarkers of Senescent Human CD8+ T Cells , 2013, PloS one.

[19]  I. Siloși,et al.  Tumor necrosis factor-α serum levels in healthy smokers and nonsmokers , 2010, International Journal of Chronic Obstructive Pulmonary Disease.

[20]  Manuel C. Peitsch,et al.  An 8-Month Systems Toxicology Inhalation/Cessation Study in Apoe−/− Mice to Investigate Cardiovascular and Respiratory Exposure Effects of a Candidate Modified Risk Tobacco Product, THS 2.2, Compared With Conventional Cigarettes , 2015, Toxicological sciences : an official journal of the Society of Toxicology.

[21]  P. Bushel,et al.  Blood gene expression profiling of an early acetaminophen response , 2016, The Pharmacogenomics Journal.

[22]  Jong Y. Park,et al.  Tobacco smoking-response genes in blood and buccal cells. , 2015, Toxicology letters.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Theodore Sakellaropoulos,et al.  A crowd-sourcing approach for the construction of species-specific cell signaling networks , 2014, Bioinform..

[25]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[26]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[27]  F. Cambien,et al.  SASH1, a new potential link between smoking and atherosclerosis. , 2015, Atherosclerosis.

[28]  Pius Joseph,et al.  Blood transcriptomics: applications in toxicology , 2013, Journal of applied toxicology : JAT.

[29]  M. Peitsch,et al.  Evaluation of the Tobacco Heating System 2.2. Part 1: Description of the system and the scientific assessment program. , 2016, Regulatory toxicology and pharmacology : RTP.

[30]  M. Eszlinger,et al.  Tobacco smoking differently influences cell types of the innate and adaptive immune system—indications from CpG site methylation , 2016, Clinical Epigenetics.

[31]  H. Brenner,et al.  Self-reported smoking, serum cotinine, and blood DNA methylation. , 2016, Environmental research.

[32]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[33]  Theodore Sakellaropoulos,et al.  The species translation challenge—A systems biology perspective on human and rat bronchial epithelial cells , 2014, Scientific Data.

[34]  V. Salomaa,et al.  Aberrant circulating levels of purinergic signaling markers are associated with several key aspects of peripheral atherosclerosis and thrombosis. , 2015, Circulation research.

[35]  Melvin E. Andersen,et al.  Incorporating New Technologies Into Toxicity Testing and Risk Assessment: Moving From 21st Century Vision to a Data-Driven Framework , 2013, Toxicological sciences : an official journal of the Society of Toxicology.

[36]  U. Sack,et al.  A varying T cell subtype explains apparent tobacco smoking induced single CpG hypomethylation in whole blood , 2015, Clinical Epigenetics.

[37]  Paolo Vineis,et al.  Epigenetic Signatures of Cigarette Smoking , 2016, Circulation. Cardiovascular genetics.

[38]  Paolo Vineis,et al.  Dynamics of smoking-induced genome-wide methylation changes with time since smoking cessation. , 2015, Human molecular genetics.

[39]  J. Hoeng,et al.  Identification of gene expression signature for cigarette smoke exposure response—from man to mouse , 2015, Human & experimental toxicology.

[40]  Panos M. Pardalos,et al.  Current classification algorithms for biomedical applications , 2008 .

[41]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[42]  M. Peitsch,et al.  Cigarette smoke induces molecular responses in respiratory tissues of ApoE(-/-) mice that are progressively deactivated upon cessation. , 2013, Toxicology.

[43]  A. Agustí,et al.  Systemic Inflammatory Response to Smoking in Chronic Obstructive Pulmonary Disease: Evidence of a Gender Effect , 2014, PloS one.

[44]  Yang Xiang,et al.  Community-Reviewed Biological Network Models for Toxicology and Drug Discovery Applications , 2016, Gene regulation and systems biology.

[45]  Michael J. Lush,et al.  HCOP: a searchable database of human orthology predictions , 2006, Briefings Bioinform..

[46]  M. Peitsch,et al.  Evaluation of the tobacco heating system 2.2. Part 9: Application of systems pharmacology to identify exposure response markers in peripheral blood of smokers switching to THS2.2. , 2016, Regulatory toxicology and pharmacology : RTP.

[47]  Philip Beineke,et al.  A whole blood gene expression-based signature for smoking status , 2012, BMC Medical Genomics.

[48]  Ashraf Elamin,et al.  A 7-month cigarette smoke inhalation study in C57BL/6 mice demonstrates reduced lung inflammation and emphysema following smoking cessation or aerosol exposure from a prototypic modified risk tobacco product. , 2015, Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association.

[49]  Paul H. C. Eilers,et al.  Prenatal parental tobacco smoking, gene specific DNA methylation, and newborns size: the Generation R study , 2015, Clinical Epigenetics.

[50]  F. Lüdicke,et al.  Evaluation of the Tobacco Heating System 2.2. Part 8: 5-Day randomized reduced exposure clinical study in Poland. , 2016, Regulatory toxicology and pharmacology : RTP.

[51]  Nilesh J Samani,et al.  Cigarette smoking reduces DNA methylation levels at multiple genomic loci but the effect is partially reversible upon cessation , 2014, Epigenetics.

[52]  Shahar Barbash,et al.  Statistically invalid classification of high throughput gene expression data , 2013, Scientific Reports.

[53]  M. Peitsch,et al.  Alterations in the sputum proteome and transcriptome in smokers and early-stage COPD subjects. , 2015, Journal of proteomics.

[54]  S. Tonstad,et al.  Effect of smoking cessation on markers of inflammation and endothelial cell activation among individuals with high risk for cardiovascular disease , 2007, Scandinavian journal of clinical and laboratory investigation.

[55]  Ajay K. Royyuru,et al.  Industrial methodology for process verification in research (IMPROVER): toward systems biology verification , 2012, Bioinform..

[56]  C. Gachet,et al.  Purinergic Receptors in Thrombosis and Inflammation , 2015, Arteriosclerosis, thrombosis, and vascular biology.

[57]  F. Di Virgilio,et al.  Purinergic Receptor Inhibition Prevents the Development of Smoke-Induced Lung Injury and Emphysema , 2010, The Journal of Immunology.