Combined Transcriptomics Analysis for Classification of Adverse Effects As a Potential End Point in Effect Based Screening.

Environmental risk assessment relies on the use of bioassays to assess the environmental impact of chemicals. Gene expression is gaining acceptance as a valuable mechanistic end point in bioassays and effect-based screening. Data analysis and its results, however, are complex and often not directly applicable in risk assessment. Classifier analysis is a promising method to turn complex gene expression analysis results into answers suitable for risk assessment. We have assembled a large gene expression data set assembled from multiple studies and experiments in the springtail Folsomia candida, with the aim of selecting a set of genes that can be trained to classify general toxic stress. By performing differential expression analysis prior to classifier training, we were able to select a set of 135 genes which was enriched in stress related processes. Classifier models from this set were used to classify two test sets comprised of chemical spiked, polluted, and clean soils and compared to another, more traditional classifier feature selection. The gene set presented here outperformed the more traditionally selected gene set. This gene set has the potential to be used as a biomarker to test for adverse effects caused by chemicals in springtails to provide end points in environmental risk assessment.

[1]  C. A. V. van Gestel,et al.  Ecotoxicogenomic assessment of diclofenac toxicity in soil. , 2015, Environmental pollution.

[2]  Youping Deng,et al.  Identification of biomarkers that distinguish chemical contaminants based on gene expression profiles , 2014, BMC Genomics.

[3]  P. Urwin,et al.  Adaptive and Specialised Transcriptional Responses to Xenobiotic Stress in Caenorhabditis elegans Are Regulated by Nuclear Hormone Receptors , 2013, PloS one.

[4]  D. Roelofs Erratum: Functional environmental genomics of a municipal landfill soil , 2013, Front. Genet..

[5]  Xiaohui Lin,et al.  A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information. , 2012, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[6]  G. Ankley,et al.  Discovery and validation of gene classifiers for endocrine-disrupting chemicals in zebrafish (danio rerio) , 2012, BMC Genomics.

[7]  J. Ellers,et al.  Molecular and life-history effects of a natural toxin on herbivorous and non-target soil arthropods , 2012, Ecotoxicology.

[8]  W. Röling,et al.  The influence of long-term copper contaminated agricultural soil at different pH levels on microbial communities and springtail transcriptional regulation. , 2012, Environmental science & technology.

[9]  C. Thummel,et al.  Transcriptional Regulation of Xenobiotic Detoxification in Drosophila , 2011, Genes & development.

[10]  D. Roelofs,et al.  Narcotic mechanisms of acute toxicity of chlorinated anilines in Folsomia candida (Collembola) revealed by gene expression analysis. , 2011, Environment international.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  T. Dijkstra,et al.  Transcriptional plasticity of a soil arthropod across different ecological conditions , 2011, Molecular ecology.

[13]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[14]  B. Ylstra,et al.  Gene expression microarray analysis of heat stress in the soil invertebrate Folsomia candida , 2010, Insect molecular biology.

[15]  Douwe Molenaar,et al.  Gene expression analysis reveals a gene set discriminatory to different metals in soil. , 2010, Toxicological sciences : an official journal of the Society of Toxicology.

[16]  Daniel L Villeneuve,et al.  Adverse outcome pathways: A conceptual framework to support ecotoxicology research and risk assessment , 2010, Environmental toxicology and chemistry.

[17]  Jean-Louis Foulley,et al.  Gene expression Moderated effect size and P-value combinations for microarray meta-analyses , 2009 .

[18]  B. Ylstra,et al.  Transcriptomics reveals extensive inducible biotransformation in the soil-dwelling invertebrate Folsomia candida exposed to phenanthrene , 2009, BMC Genomics.

[19]  B. Ylstra,et al.  Gene expression analysis of collembola in cadmium containing soil. , 2008, Environmental science & technology.

[20]  N. V. van Straalen,et al.  Genomics technology for assessing soil pollution , 2008, Journal of biology.

[21]  E. Perkins,et al.  Gene expression profiling in Daphnia magna part I: concentration-dependent profiles provide support for the No Observed Transcriptional Effect Level. , 2008, Environmental science & technology.

[22]  Zhen Li,et al.  A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model , 2008, BMC Bioinformatics.

[23]  Rong-Lin Wang,et al.  DNA Microarray‐based ecotoxicological biomarker discovery in a small fish model species , 2008, Environmental toxicology and chemistry.

[24]  James J. Chen,et al.  Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data , 2007, BMC Bioinformatics.

[25]  M. Timmermans,et al.  Collembase: a repository for springtail genomics and soil quality assessment , 2007, BMC Genomics.

[26]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[27]  Gerald T Ankley,et al.  Toxicogenomics in regulatory ecotoxicology. , 2006, Environmental science & technology.

[28]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[29]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[30]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[31]  Juan Miguel García-Gómez,et al.  Sequence analysis Blast 2 GO : a universal tool for annotation , visualization and analysis in functional genomics research , 2005 .

[32]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[33]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[34]  David Edwards,et al.  Non-linear Normalization and Background Correction in One-channel CDNA Microarray Studies , 2003, Bioinform..

[35]  Chris H. Q. Ding,et al.  Analysis of gene expression profiles: class discovery and leaf ordering , 2002, RECOMB '02.

[36]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[37]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[38]  Fabian Model,et al.  Feature selection for DNA methylation based cancer classification , 2001, ISMB.

[39]  J. C. Greene,et al.  Review of whole-organism bioassays: soil, freshwater sediment, and freshwater assessment in Canada. , 1995, Ecotoxicology and environmental safety.

[40]  M. Depledge,et al.  The role of biomarkers in environmental assessment (2). Invertebrates , 1994, Ecotoxicology.

[41]  D. B. Peakall,et al.  The role of biomarkers in environmental assessment , 1992 .

[42]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.