When global and local molecular descriptors are more than the sum of its parts: Simple, But Not Simpler?

In this report, we introduce a set of aggregation operators (AOs) to calculate global and local (group and atom type) molecular descriptors (MDs) as a generalization of the classical approach of molecular encoding using the sum of the atomic (or fragment) contributions. These AOs are implemented in a new and free software denominated MD-LOVIs (http://tomocomd.com/md-lovis), which allows for the calculation of MDs from atomic weights vector and LOVIs (local vertex invariants). This software was developed in Java programming language and employed the Chemical Development Kit (CDK) library for handling chemical structures and the calculation of atomic weights. An analysis of the complexities of the algorithms presented herein demonstrates that these aspects were efficiently implemented. The calculation speed experiments show that the MD-LOVIs software has satisfactory behavior when compared to software such as Padel, CDKDescriptor, DRAGON and Bluecal software. Shannon’s entropy (SE)-based variability studies demonstrate that MD-LOVIs yields indices with greater information content when compared to those of popular academic and commercial software. A principal component analysis reveals that our approach captures chemical information orthogonal to that codified by the DRAGON, Padel and Mold2 software, as a result of the several generalizations in MD-LOVIs not used in other programs. Lastly, three QSARs were built using multiple linear regression with genetic algorithms, and the statistical parameters of these models demonstrate that the MD-LOVIs indices obtained with AOs yield better performance than those obtained when the summation operator is used exclusively. Moreover, it is also revealed that the MD-LOVIs indices yield models with comparable to superior performance when compared to other QSAR methodologies reported in the literature, despite their simplicity. The studies performed herein collectively demonstrated that MD-LOVIs software generates indices as simple as possible, but not simpler and that use of AOs enhances the diversity of the chemical information codified, which consequently improves the performance of traditional MDs.

[1]  R. Mesiar,et al.  Aggregation operators: new trends and applications , 2002 .

[2]  S. Stanley Young,et al.  PowerMV: A Software Environment for Molecular Viewing, Descriptor Generation, Data Analysis and Hit Evaluation. , 2005 .

[3]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[4]  L. Hall,et al.  Molecular Structure Description: The Electrotopological State , 1999 .

[5]  Dong-Sheng Cao,et al.  ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation , 2015, Journal of Cheminformatics.

[6]  F. Cortés‐Guzmán,et al.  GOWAWA Aggregation Operator‐based Global Molecular Characterizations: Weighting Atom/bond Contributions (LOVIs/LOEIs) According to their Influence in the Molecular Encoding , 2018, Molecular informatics.

[7]  A combined use of global and local approaches in 3D-QSAR , 2000 .

[8]  Application to QSAR studies of 2-furylethylene derivatives , 2009 .

[9]  Francisco Torrens,et al.  Shannon's, mutual, conditional and joint entropy information indices: generalization of global indices defined from local vertex invariants. , 2013, Current computer-aided drug design.

[10]  K. Mardia Measures of multivariate skewness and kurtosis with applications , 1970 .

[11]  Eugene A. Coats,et al.  The CoMFA Steroids as a Benchmark Dataset for Development of 3D QSAR Methods , 1998 .

[12]  P. Broto,et al.  Molecular structures: perception, autocorrelation descriptor and sar studies. Autocorrelation descriptor , 1984 .

[13]  Qi Zhou,et al.  An interval-valued 2-tuple linguistic group decision-making model based on the Choquet integral operator , 2018, Int. J. Syst. Sci..

[14]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[15]  Witold Pedrycz,et al.  Generalized Choquet Integral for Face Recognition , 2018, Int. J. Fuzzy Syst..

[16]  G. Klebe,et al.  Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. , 1994, Journal of medicinal chemistry.

[17]  Zeshui Xu,et al.  Fuzzy ordered distance measures , 2012, Fuzzy Optim. Decis. Mak..

[18]  Reino Laatikainen,et al.  Ligand intramolecular motions in ligand-protein interaction: ALPHA, a novel dynamic descriptor and a QSAR study with extended steroid benchmark dataset , 2004, J. Comput. Aided Mol. Des..

[19]  Yoshimasa Takahashi,et al.  New Molecular Fragmental Descriptors and Their Application to the Prediction of Fish Toxicity , 2022 .

[20]  F. Sanz,et al.  Use of alignment-free molecular descriptors in diversity analysis and optimal sampling of molecular libraries , 2004, Molecular Diversity.

[21]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[22]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[23]  Michael H Abraham,et al.  Fast calculation of van der Waals volume as a sum of atomic and bond contributions and its application to drug compounds. , 2003, The Journal of organic chemistry.

[24]  Gajendra P. S. Raghava,et al.  NPACT: Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database , 2012, Nucleic Acids Res..

[25]  G. Avery WHO report 2000 , 2001, The Lancet.

[26]  J. Gálvez,et al.  Event-based criteria in GT-STAF information indices: theory, exploratory diversity analysis and QSPR applications , 2013, SAR and QSAR in environmental research.

[27]  José M. Merigó,et al.  New decision-making techniques and their application in the selection of financial products , 2010, Inf. Sci..

[28]  Y. Marrero-Ponce,et al.  Discrete Derivatives for Atom‐Pairs as a Novel Graph‐Theoretical Invariant for Generating New Molecular Descriptors: Orthogonality, Interpretation and QSARs/QSPRs on Benchmark Databases , 2014, Molecular informatics.

[29]  I. W Nowell,et al.  Molecular Connectivity in Structure-Activity Analysis , 1986 .

[30]  Roberto Todeschini,et al.  MobyDigs: software for regression and classification models by genetic algorithms , 2003 .

[31]  Kwong-Sak Leung,et al.  Nonlinear Integrals and Their Applications in Data Mining , 2010, Advances in Fuzzy Systems - Applications and Theory.

[32]  Gordon M. Crippen,et al.  Atomic physicochemical parameters for three-dimensional-structure-directed quantitative structure-activity relationships. 2. Modeling dispersive and hydrophobic interactions , 1987, J. Chem. Inf. Comput. Sci..

[33]  Gleb Beliakov,et al.  How to build aggregation operators from data , 2003, Int. J. Intell. Syst..

[34]  José M. Merigó,et al.  Distance measures, weighted averages, OWA operators and Bonferroni means , 2017, Appl. Soft Comput..

[35]  Yovani Marrero-Ponce,et al.  IMMAN: free software for information theory-based chemometric analysis , 2015, Molecular Diversity.

[36]  Yovani Marrero-Ponce,et al.  QuBiLS‐MIDAS: A parallel free‐software for molecular descriptors computation based on multilinear algebraic maps , 2014, J. Comput. Chem..

[37]  M. Randic Characterization of molecular branching , 1975 .

[38]  Weida Tong,et al.  Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics , 2008, J. Chem. Inf. Model..

[39]  Lemont B. Kier,et al.  An Electrotopological-State Index for Atoms in Molecules , 1990, Pharmaceutical Research.

[40]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[41]  Jahan B. Ghasemi,et al.  3D-QSAR studies on the toxicity of substituted benzenes to Tetrahymena pyriformis: CoMFA, CoMSIA and VolSurf approaches. , 2014, Ecotoxicology and environmental safety.

[42]  E Estrada,et al.  Novel local (fragment-based) topological molecular descriptors for QSpr/QSAR and molecular design. , 2001, Journal of molecular graphics & modelling.

[43]  J. Sutherland,et al.  A comparison of methods for modeling quantitative structure-activity relationships. , 2004, Journal of medicinal chemistry.

[44]  Yovani Marrero-Ponce,et al.  QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations , 2017, Journal of Cheminformatics.

[45]  Philip J. Fleming,et al.  How not to lie with statistics: the correct way to summarize benchmark results , 1986, CACM.

[46]  Joseph N. Wilson,et al.  Discrete Choquet Integral as a Distance Metric , 2008, IEEE Transactions on Fuzzy Systems.

[47]  Enrique Molina,et al.  3D connectivity indices in QSPR/QSAR studies. , 2001 .

[48]  Jürgen Bajorath,et al.  Molecular Similarity Concepts for Informatics Applications. , 2017, Methods in molecular biology.

[49]  S. Basak,et al.  Characterization of Molecular Structures Using Topological Indices , 1997 .

[50]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[51]  B. Fan,et al.  Molecular similarity and diversity in chemoinformatics: From theory to applications , 2006, Molecular Diversity.

[52]  J. Sangshetti,et al.  Recent advances in multidimensional QSAR (4D-6D): a critical review. , 2014, Mini reviews in medicinal chemistry.

[53]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[54]  R. Cramer,et al.  Recent advances in comparative molecular field analysis (CoMFA). , 1989, Progress in clinical and biological research.

[55]  Structure — Molar Refraction Relationships of Alkylgermanes Using Molecular Connectivity , 1988 .

[56]  Yovani Marrero-Ponce,et al.  Linear Indices of the "Molecular Pseudograph's Atom Adjacency Matrix": Definition, Significance-Interpretation, and Application to QSAR Analysis of Flavone Derivatives as HIV-1 Integrase Inhibitors , 2004, J. Chem. Inf. Model..

[57]  José M. Merigó,et al.  A unified model between the weighted average and the induced OWA operator , 2011, Expert Syst. Appl..

[58]  P. Selzer,et al.  Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. , 2000, Journal of medicinal chemistry.

[59]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[60]  Yi-Zeng Liang,et al.  Structural Interpretation of the Topological Index. 2. The Molecular Connectivity Index, the Kappa Index, and the Atom-type E-State Index , 2004, J. Chem. Inf. Model..

[61]  Bovas Abraham Quality Improvement Through Statistical Methods , 2012 .

[62]  Alan R. Katritzky,et al.  COMPREHENSIVE DESCRIPTORS FOR STRUCTURAL AND STATISTICAL ANALYSIS. 1 : CORRELATIONS BETWEEN STRUCTURE AND PHYSICAL PROPERTIES OF SUBSTITUTED PYRIDINES , 1996 .

[63]  Yovani Marrero-Ponce,et al.  Choquet integral-based fuzzy molecular characterizations: when global definitions are computed from the dependency among atom/bond contributions (LOVIs/LOEIs) , 2018, Journal of Cheminformatics.

[64]  J. Green,et al.  Prediction of aquatic toxicity of benzene derivatives using molecular descriptor from atomic weighted vectors. , 2017, Environmental toxicology and pharmacology.

[65]  A. Mani-Varnosfaderani,et al.  Identification of molecular features necessary for selective inhibition of B cell lymphoma proteins using machine learning techniques , 2018, Molecular Diversity.

[66]  Francisco Torrens,et al.  State of the Art Review and Report of New Tool for Drug Discovery. , 2017, Current topics in medicinal chemistry.

[67]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[68]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[69]  Ernesto Estrada,et al.  3D Connectivity Indices in QSPR/QSAR Studies , 2001, J. Chem. Inf. Comput. Sci..

[70]  S. Unger Molecular Connectivity in Structure–activity Analysis , 1987 .