A Qualitative Modeling Approach for Whole Genome Prediction Using High-Throughput Toxicogenomics Data and Pathway-Based Validation

Efficient high-throughput transcriptomics (HTT) tools promise inexpensive, rapid assessment of possible biological consequences of human and environmental exposures to tens of thousands of chemicals in commerce. HTT systems have used relatively small sets of gene expression measurements coupled with mathematical prediction methods to estimate genome-wide gene expression and are often trained and validated using pharmaceutical compounds. It is unclear whether these training sets are suitable for general toxicity testing applications and the more diverse chemical space represented by commercial chemicals and environmental contaminants. In this work, we built predictive computational models that inferred whole genome transcriptional profiles from a smaller sample of surrogate genes. The model was trained and validated using a large scale toxicogenomics database with gene expression data from exposure to heterogeneous chemicals from a wide range of classes (the Open TG-GATEs data base). The method of predictor selection was designed to allow high fidelity gene prediction from any pre-existing gene expression data set, regardless of animal species or data measurement platform. Predictive qualitative models were developed with this TG-GATES data that contained gene expression data of human primary hepatocytes with over 941 samples covering 158 compounds. A sequential forward search-based greedy algorithm, combining different fitting approaches and machine learning techniques, was used to find an optimal set of surrogate genes that predicted differential expression changes of the remaining genome. We then used pathway enrichment of up-regulated and down-regulated genes to assess the ability of a limited gene set to determine relevant patterns of tissue response. In addition, we compared prediction performance using the surrogate genes found from our greedy algorithm (referred to as the SV2000) with the landmark genes provided by existing technologies such as L1000 (Genometry) and S1500 (Tox21), finding better predictive performance for the SV2000. The ability of these predictive algorithms to predict pathway level responses is a positive step toward incorporating mode of action (MOA) analysis into the high throughput prioritization and testing of the large number of chemicals in need of safety evaluation.

[1]  Michael B. Black,et al.  Assessing molecular initiating events (MIEs), key events (KEs) and modulating factors (MFs) for styrene responses in mouse lungs using whole genome gene expression profiling following 1‐day and multi‐week exposures , 2017, Toxicology and applied pharmacology.

[2]  John S. House,et al.  A Pipeline for High-Throughput Concentration Response Modeling of Gene Expression for Toxicogenomics , 2017, Front. Genet..

[3]  S. O. Mueller,et al.  Genomic profiling uncovers a molecular pattern for toxicological characterization of mutagens and promutagens in vitro. , 2011, Toxicological sciences : an official journal of the Society of Toxicology.

[4]  Joanne M Yeakley,et al.  A trichostatin A expression signature identified by TempO-Seq targeted whole transcriptome profiling , 2017, PloS one.

[5]  Joshua F. Robinson,et al.  Discriminating classes of developmental toxicants using gene expression profiling in the embryonic stem cell test. , 2011, Toxicology letters.

[6]  A. E. Hirsh,et al.  Coevolution of gene expression among interacting proteins , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Jiri Aubrecht,et al.  Development of a toxicogenomics signature for genotoxicity using a dose‐optimization and informatics strategy in human cells , 2015, Environmental and molecular mutagenesis.

[9]  Michael B. Black,et al.  MYC Is an Early Response Regulator of Human Adipogenesis in Adipose Stem Cells , 2014, PloS one.

[10]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[11]  Davide Ballabio,et al.  The Kohonen and CP-ANN toolbox: A collection of MATLAB modules for Self Organizing Maps and Counterpropagation Artificial Neural Networks , 2009 .

[12]  Othman Soufan,et al.  T1000: a reduced gene set prioritized for toxicogenomic studies , 2019, PeerJ.

[13]  Michael B. Black,et al.  Using gene expression profiling to evaluate cellular responses in mouse lungs exposed to V2O5 and a group of other mouse lung tumorigens and non-tumorigens. , 2015, Regulatory toxicology and pharmacology : RTP.

[14]  Jinsong Qiu,et al.  RASL‐seq for Massively Parallel and Quantitative Analysis of Gene Expression , 2012, Current protocols in molecular biology.

[15]  G. Warnes,et al.  Differentiation of DNA reactive and non-reactive genotoxic mechanisms using gene expression profile analysis. , 2004, Mutation research.

[16]  David J. Ketchen,et al.  THE APPLICATION OF CLUSTER ANALYSIS IN STRATEGIC MANAGEMENT RESEARCH: AN ANALYSIS AND CRITIQUE , 1996 .

[17]  K. Tomer,et al.  Changes in global gene and protein expression during early mouse liver carcinogenesis induced by non-genotoxic model carcinogens oxazepam and Wyeth-14,643. , 2003, Carcinogenesis.

[18]  Aldert H Piersma,et al.  Concentration-response analysis of differential gene expression in the zebrafish embryotoxicity test following flusilazole exposure. , 2012, Toxicological sciences : an official journal of the Society of Toxicology.

[19]  Angela N. Brooks,et al.  A Next Generation Connectivity Map: L1000 Platform And The First 1,000,000 Profiles , 2017 .

[20]  Melvin E. Andersen,et al.  Temporal concordance between apical and transcriptional points of departure for chemical risk assessment. , 2013, Toxicological sciences : an official journal of the Society of Toxicology.

[21]  G. Wagner,et al.  Pervasive Correlated Evolution in Gene Expression Shapes Cell and Tissue Type Transcriptomes , 2018, Genome biology and evolution.

[22]  Hiroshi Yamada,et al.  Open TG-GATEs: a large-scale toxicogenomics database , 2014, Nucleic Acids Res..

[23]  Jeroen L A Pennings,et al.  Time-response evaluation by transcriptomics of methylmercury effects on neural differentiation of murine embryonic stem cells. , 2011, Toxicological sciences : an official journal of the Society of Toxicology.

[24]  Russ B. Altman,et al.  Imputing gene expression to maximize platform compatibility , 2016, Bioinform..

[25]  Stephen C. Harris,et al.  Rat toxicogenomic study reveals analytical consistency across microarray platforms , 2006, Nature Biotechnology.

[26]  Melvin E. Andersen,et al.  Profiling Dose-Dependent Activation of p53-Mediated Signaling Pathways by Chemicals with Distinct Mechanisms of DNA Damage , 2014, Toxicological sciences : an official journal of the Society of Toxicology.

[27]  J. Pennings,et al.  Transcriptomics-based identification of developmental toxicants through their interference with cardiomyocyte differentiation of embryonic stem cells. , 2010, Toxicology and applied pharmacology.

[28]  M. Bittner,et al.  Physiological function as regulation of large transcriptional programs: the cellular response to genotoxic stress. , 2000, Comparative biochemistry and physiology. Part B, Biochemistry & molecular biology.

[29]  Melvin E. Andersen,et al.  Combining transcriptomics and PBPK modeling indicates a primary role of hypoxia and altered circadian signaling in dichloromethane carcinogenicity in mouse lung and liver , 2017, Toxicology and Applied Pharmacology.

[30]  Atul J. Butte,et al.  Quantifying the relationship between co-expression, co-regulation and gene function , 2004, BMC Bioinformatics.

[31]  H. Ellinger-Ziegelbauer,et al.  Prediction of a carcinogenic potential of rat hepatocarcinogens using toxicogenomics analysis of short-term in vivo studies. , 2008, Mutation research.

[32]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[33]  G. Gibson,et al.  Cross-species comparison of genome-wide expression patterns , 2004, Genome Biology.

[34]  H. Segner,et al.  Mode of Action Assignment of Chemicals Using Toxicogenomics: A Case Study with Oxidative Uncouplers , 2017, Front. Environ. Sci..

[35]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[36]  Melvin E Andersen,et al.  A map of the PPARα transcription regulatory network for primary human hepatocytes. , 2014, Chemico-biological interactions.