Expectation-Maximization Model for Substitution of Missing Values Characterizing Greenness of Organic Solvents

Organic solvents are ubiquitous in chemical laboratories and the Green Chemistry trend forces their detailed assessments in terms of greenness. Unfortunately, some of them are not fully characterized, especially in terms of toxicological endpoints that are time consuming and expensive to be determined. Missing values in the datasets are serious obstacles, as they prevent the full greenness characterization of chemicals. A featured method to deal with this problem is the application of Expectation-Maximization algorithm. In this study, the dataset consists of 155 solvents that are characterized by 13 variables is treated with Expectation-Maximization algorithm to predict missing data for toxicological endpoints, bioavailability, and biodegradability data. The approach may be particularly useful for substitution of missing values of environmental, health, and safety parameters of new solvents. The presented approach has high potential to deal with missing values, while assessing environmental, health, and safety parameters of other chemicals.

[1]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[2]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[3]  Pascual Pérez,et al.  Green solvents from glycerol. Synthesis and physico-chemical properties of alkyl glycerol ethers , 2010 .

[4]  Paul Anastas,et al.  Green chemistry: principles and practice. , 2010, Chemical Society reviews.

[5]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[6]  Peter J. Dunn,et al.  Green chemistry tools to influence a medicinal chemistry and research chemistry based organisation , 2008 .

[7]  Martin A. Abraham,et al.  Clean solvents : alternative media for chemical reactions and processing , 2002 .

[8]  Francisco Pena-Pereira,et al.  Environmental risk-based ranking of solvents using the combination of a multimedia model and multi-criteria decision analysis , 2017 .

[9]  John D. Hayler,et al.  Updating and further expanding GSK's solvent sustainability guide , 2016 .

[10]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[11]  Alan D. Curzons,et al.  Solvent selection guide: a guide to the integration of environmental, health and safety criteria into the selection of solvents , 1999 .

[12]  Francisco Pena-Pereira,et al.  Perspectives on the replacement of harmful organic solvents in analytical methodologies: a framework toward the implementation of a generation of eco-friendly alternatives , 2015 .

[13]  John D. Hayler,et al.  CHEM21 selection guide of classical- and less classical-solvents , 2016 .

[14]  Andrew L. Kung,et al.  Chemical genomics identifies small-molecule MCL1 repressors and BCL-xL as a predictor of MCL1 dependency. , 2012, Cancer cell.

[15]  Anish Suri,et al.  Predicting peptides bound to I‐Ag7 class II histocompatibility molecules using a novel expectation‐maximization alignment algorithm , 2007, Proteomics.

[16]  P. Anastas,et al.  Green Chemistry , 2018, Environmental Science.

[17]  Colin L. Raston,et al.  Recent Advances in Solventless Organic Reactions: Towards Benign Synthesis with Remarkable Versatility , 2002 .

[18]  Concepción Jiménez-González,et al.  Expanding GSK’s Solvent Selection Guide—application of life cycle assessment to enhance solvent selections , 2004 .

[19]  M. Tobiszewski,et al.  A solvent selection guide based on chemometrics and multicriteria decision analysis , 2015 .

[20]  François Jérôme,et al.  Bio-based solvents: an emerging generation of fluids for the design of eco-efficient processes in catalysis and organic chemistry. , 2013, Chemical Society reviews.

[21]  Valérie Molinier,et al.  Panorama of sustainable solvents using the COSMO-RS approach , 2012 .

[22]  Concepción Jiménez-González,et al.  Expanding GSK's solvent selection guide ― embedding sustainability into solvent selection starting at medicinal chemistry , 2011 .

[23]  Mehrdad Jalali,et al.  A hybrid Bayesian network and tensor factorization approach for missing value imputation to improve breast cancer recurrence prediction , 2019, J. King Saud Univ. Comput. Inf. Sci..

[24]  W. Shiu,et al.  Handbook of Physical-Chemical Properties and Environmental Fate for Organic Chemicals , 2006 .

[25]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[26]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[27]  Paul T. Anastas Green Chemistry as Applied to Solvents , 2002 .

[28]  Xiaohong Xu,et al.  Evaluation of missing value methods for predicting ambient BTEX concentrations in two neighbouring cities in Southwestern Ontario Canada , 2018 .

[29]  C Nerín,et al.  Critical review on recent developments in solventless techniques for extraction of analytes , 2009, Analytical and bioanalytical chemistry.

[30]  Ivana Stanimirova,et al.  How to construct a multiple regression model for data with missing elements and outlying objects. , 2007, Analytica chimica acta.

[31]  István T. Horváth,et al.  Catalytic Conversion of Fructose, Glucose, and Sucrose to 5-(Hydroxymethyl)furfural and Levulinic and Formic Acids in γ-Valerolactone As a Green Solvent , 2014 .

[32]  Eberhard Guntrum,et al.  Sanofi’s Solvent Selection Guide: A Step Toward More Sustainable Processes , 2013 .

[33]  A. J. Hunt,et al.  Tools and techniques for solvent selection: green solvent selection guides , 2016 .