Recursive feature elimination in random forest classification supports nanomaterial grouping

Abstract Nanomaterials (NMs) can be produced in numerous different variants of the same chemical substance. An in-depth safety assessment for each variant by generating test data will simply not be feasible. Thus, NM grouping approaches that would significantly reduce the time and amount of testing for novel NMs are urgently needed. However, identifying structurally similar NM variants remains challenging as many physico-chemical properties could be relevant. Here, we aimed at emphasizing on the value of machine learning models in the process of NM grouping by considering a case study on eleven selected, well-characterized NMs. To that end, we linked physico-chemical properties of these NMs to characterized hallmarks for inhalation toxicity. We applied unsupervised and supervised machine learning techniques to determine which combination of properties is most predictive. First, we assessed NM similarity in an unsupervised manner using principal component analysis (PCA) followed by subsequent superposition of activity labels combined with a k-nearest neighbors approach. Then, we used random forests (RFs) as a supervised machine learning technique which directly uses the knowledge on the activity class in the process of defining NM similarity. Thus, similarity was defined only on those properties showing the highest correlation with the activity and therefore had the highest discriminative power. In order to improve the performance, we then used recursive feature elimination (RFE) to delete uninformative features biasing the results. The best performance was achieved by the reduced RF model based on RFE where a balanced accuracy of 0.82 was obtained. Out of eleven different properties we determined zeta potential, redox potential and dissolution rate to have the strongest predicting impact on biological NM activity in the present dataset. Though the dataset is too small with respect to the number of NMs studied and the applicability domain is expected to be very limited due to the fact that only few material classes were covered, our study demonstrates how machine learning and feature selection methods can be implemented for identifying the most relevant physico-chemical NM properties with respect to toxicity. We suggest that once the most relevant properties have been detected in a model built on a sufficient number of different NMs and across multiple NM classes, they should obtain special emphasis in future grouping approaches.

[1]  Karin Aschberger,et al.  Grouping of multi-walled carbon nanotubes to read-across genotoxicity: A case study to evaluate the applicability of regulatory guidance , 2019, Computational Toxicology.

[2]  Adriele Prina-Mello,et al.  Towards a nanospecific approach for risk assessment. , 2016, Regulatory toxicology and pharmacology : RTP.

[3]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[4]  Vicki Stone,et al.  Nano-TiO2 – feasibility and challenges for human health risk assessment based on open literature , 2010, Nanotoxicology.

[5]  Bertrand Michel,et al.  Correlation and variable importance in random forests , 2013, Statistics and Computing.

[6]  Jang Sik Choi,et al.  Toxicity Classification of Oxide Nanomaterials: Effects of Data Gap Filling and PChem Score-based Screening Approaches , 2018, Scientific Reports.

[7]  Tomasz Puzyn,et al.  EU US Roadmap Nanoinformatics 2030 , 2018 .

[8]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[9]  Wendel Wohlleben,et al.  Surface reactivity measurements as required for grouping and read-across: An advanced FRAS protocol , 2017 .

[10]  Hugh J. Byrne,et al.  Concern-driven integrated approaches to nanomaterial testing and assessment – report of the NanoSafety Cluster Working Group 10 , 2013, Nanotoxicology.

[11]  Burcu F. Darst,et al.  Using recursive feature elimination in random forest to account for correlated variables in high dimensional data , 2018, BMC Genetics.

[12]  T. Webb,et al.  Pulmonary toxicity study in rats with three forms of ultrafine-TiO2 particles: differential responses related to surface properties. , 2007, Toxicology.

[13]  A Worth,et al.  Grouping of nanomaterials to read-across hazard endpoints: from data collection to assessment of the grouping hypothesis by application of chemoinformatic techniques , 2018, Particle and Fibre Toxicology.

[14]  Thomas A. J. Kuhlbusch,et al.  Analytical methods to assess the oxidative potential of nanoparticles: a review , 2017 .

[15]  Jerzy Leszczynski,et al.  From basic physics to mechanisms of toxicity: the "liquid drop" approach applied to develop predictive classification models for toxicity of metal oxide nanoparticles. , 2014, Nanoscale.

[16]  Robert Landsiedel,et al.  Assessment of the oxidative potential of nanoparticles by the cytochrome c assay: assay improvement and development of a high-throughput method to predict the toxicity of nanoparticles , 2016, Archives of Toxicology.

[17]  Achim Zeileis,et al.  Conditional variable importance for random forests , 2008, BMC Bioinformatics.

[18]  W. Chan,et al.  Prediction of nanoparticles-cell association based on corona proteins and physicochemical properties. , 2015, Nanoscale.

[19]  Eugenia Valsami-Jones,et al.  A strategy for grouping of nanomaterials based on key physico-chemical descriptors as a basis for safer-by-design NMs , 2014 .

[20]  Mary Gulumian,et al.  Dissolution and biodurability: Important parameters needed for risk assessment of nanomaterials , 2015, Particle and Fibre Toxicology.

[21]  Christie M Sayes,et al.  A framework for grouping nanoparticles based on their measurable characteristics , 2013, International journal of nanomedicine.

[22]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[23]  Craig A. Poland,et al.  Zeta potential and solubility to toxic ions as mechanisms of lung inflammation caused by metal/metal oxide nanoparticles. , 2012, Toxicological sciences : an official journal of the Society of Toxicology.

[24]  K. Hungerbühler,et al.  Prediction of nanoparticle transport behavior from physicochemical properties: machine learning provides insights to guide the next generation of transport models , 2015 .

[25]  Thomas A. J. Kuhlbusch,et al.  In vivo effects: Methodologies and biokinetics of inhaled nanomaterials , 2018 .

[26]  Kimberly F. Sellers,et al.  Grouping nanomaterials : A strategy towards grouping and read-across , 2015 .

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Robert Landsiedel,et al.  An in vitro alveolar macrophage assay for predicting the short-term inhalation toxicity of nanomaterials , 2016, Journal of Nanobiotechnology.

[29]  Feng Yang,et al.  A quantitative framework to group nanoscale and microscale particles by hazard potency to derive occupational exposure limits: Proof of concept evaluation , 2017, Regulatory toxicology and pharmacology : RTP.

[30]  David B Warheit,et al.  Pulmonary bioassay studies with nanoscale and fine-quartz particles in rats: toxicity is not dependent upon particle size but on surface characteristics. , 2007, Toxicological sciences : an official journal of the Society of Toxicology.

[31]  Philipp Probst,et al.  Random forest versus logistic regression: a large-scale benchmark experiment , 2018, BMC Bioinformatics.

[32]  Eleonore Fröhlich,et al.  The role of surface charge in cellular uptake and cytotoxicity of medical nanoparticles , 2012, International journal of nanomedicine.

[33]  Maryam Mobed-Miremadi,et al.  Machine learning provides predictive analysis into silver nanoparticle protein corona formation from physicochemical properties. , 2018, Environmental science. Nano.

[34]  Reinhard Kreiling,et al.  A decision-making framework for the grouping and testing of nanomaterials (DF4nanoGrouping). , 2015, Regulatory toxicology and pharmacology : RTP.

[35]  M. Wiemann,et al.  Application of short-term inhalation studies to assess the inhalation toxicity of nanomaterials , 2014, Particle and Fibre Toxicology.

[36]  Tomasz Puzyn,et al.  Comparing the CORAL and Random Forest Approaches for Modelling the In Vitro Cytotoxicity of Silica Nanomaterials , 2016, Alternatives to laboratory animals : ATLA.

[37]  Hedwig M Braakhuis,et al.  Physicochemical characteristics of nanomaterials that affect pulmonary inflammation , 2014, Particle and Fibre Toxicology.

[38]  Antonio Marcomini,et al.  Grouping and Read-Across Approaches for Risk Assessment of Nanomaterials , 2015, International journal of environmental research and public health.

[39]  Enrico Burello,et al.  A Mechanistic Model for Predicting Lung Inflammogenicity of Oxide Nanoparticles , 2017, Toxicological sciences : an official journal of the Society of Toxicology.

[40]  Yung-Seop Lee,et al.  Enriched random forests , 2008, Bioinform..

[41]  F. Alessandrini,et al.  Surface modifications of silica nanoparticles are crucial for their inert versus proinflammatory and immunomodulatory properties , 2014, International journal of nanomedicine.

[42]  J. West,et al.  Correlating nanoscale titania structure with toxicity: a cytotoxicity and inflammatory response study with human dermal fibroblasts and human lung epithelial cells. , 2006, Toxicological sciences : an official journal of the Society of Toxicology.

[43]  A E Nel,et al.  Implementation of alternative test strategies for the safety assessment of engineered nanomaterials , 2013, Journal of internal medicine.

[44]  Gregory V. Lowry,et al.  Progress towards standardized and validated characterizations for measuring physicochemical properties of manufactured nanomaterials relevant to nano health and safety risks , 2018 .

[45]  Shikha Gupta,et al.  Nano-QSAR modeling for predicting biological activity of diverse nanomaterials , 2014 .

[46]  Reinhard Kreiling,et al.  Case studies putting the decision-making framework for the grouping and testing of nanomaterials (DF4nanoGrouping) into practice. , 2016, Regulatory toxicology and pharmacology : RTP.