论文信息 - Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 1. Theory and simple chemometric applications.

Canonical Measure of Correlation (CMC) and Canonical Measure of Distance (CMD) between sets of data. Part 1. Theory and simple chemometric applications.

So far, similarity/diversity of objects has been widely studied in different research fields and a number of distance measures to estimate diversity between objects have been proposed. However, not much interest has been addressed to analysis of how diverse are configurations of objects in two different multivariate spaces. Since computerisation and automation nowadays lead to a large availability of information, it is apparent that a system could be described in different ways and, consequently, methods for comparison of the different viewpoints are required. These methods, for instance, may be usefully applied to Quantitative Structure-Activity Relationship (QSAR) studies. In this field, several thousands of molecular descriptors have been proposed in the literature and different selections of descriptors define different chemical spaces that need to be compared. Moreover, variable selection techniques such as Genetic Algorithms, Simulated Annealing, and Tabu Search are widely used to process available information in order to select optimal QSAR models. When more than one optimal model results, the problem arising is how to compare these models to find out whether they are really diverse or based on descriptors explaining almost the same information. In this paper, novel indices are proposed to measure similarity/diversity between pairs of data sets by the aid of the variable cross-correlation matrix.

R Todeschini | A Manganaro | A Mauri | V Consonni | D Ballabio

[1] Richard W. Hamming,et al. Error detecting and error correcting codes , 1950 .

[2] Manuela Pavan,et al. A distance measure between models: a tool for similarity/diversity analysis of model populations , 2004 .

[3] Peter C. Jurs,et al. Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated Annealing , 1995, J. Chem. Inf. Comput. Sci..

[4] Knut Baumann,et al. Cross-validation as the objective function for variable-selection techniques , 2003 .

[5] Roberto Todeschini,et al. Handbook of Molecular Descriptors , 2002 .

[6] R. Boggia,et al. Genetic algorithms as a strategy for feature selection , 1992 .

[7] Roberto Todeschini,et al. Molecular descriptors for chemoinformatics , 2009 .

[8] Brian Everitt,et al. Principles of Multivariate Analysis , 2001 .

[9] A. Cammarata,et al. Interrelationship of the regression models used for structure-activity analyses. , 1972, Journal of medicinal chemistry.