Molecular challenges in modern chemometrics

Since the very beginning of the discipline, chemometrics has mainly focussed on analytical chemical problems such as calibration. With the growing importance of databases and applications in medicinal and computational chemistry, the domains of analytical chemistry and chemometrics have been enlarged significantly in recent years. Especially the relation between molecular structure and function has become of considerable interest. Despite the huge quantities of data that are available nowadays, it is often difficult to recognise and extract relevant chemical information for the problem at hand. One of the main obstacles is the definition of an appropriate representation of a molecule. Although a variety of different representations are used, none are generally applicable. This paper focuses on the challenges that arise in the chemometrical analysis of molecular structures, the relation between structure and function and the relation between molecular representation and chemometrical modelling. Exciting opportunities for further research are illustrated using an example concerning the prediction of co-crystallisation behaviour for small organic molecules with cephalosporin antibiotics. ©1999 Elsevier Science B.V. All rights reserved.

[1]  Han van de Waterbeemd,et al.  Chemometric methods in molecular design , 1995 .

[2]  Binne Zwanenburg,et al.  Clathrate‐Type Complexation of Cephalosporins with β‐Naphthol , 1999 .

[3]  Knut Baumann Uniform-length molecular descriptors for quantitative structure–property relationships (QSPR) and quantitative structure–activity relationships (QSAR): classification studies and similarity searching , 1999 .

[4]  H M Berman,et al.  Conformations of the sugar-phosphate backbone in helical DNA crystal structures. , 1997, Biopolymers.

[5]  Lutgarde M. C. Buydens,et al.  Evolutionary optimisation : a tutorial , 1998 .

[6]  Lutgarde M. C. Buydens,et al.  Multivariate analysis of a data matrix containing A-DNA and B-DNA dinucleoside monophosphate steps: Multidimensional Ramachandran plots for nucleic acids , 1998, J. Comput. Chem..

[7]  S. Morgan,et al.  Outlier detection in multivariate analytical chemical data. , 1998, Analytical chemistry.

[8]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[9]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[10]  Lutgarde M. C. Buydens,et al.  Multivariate analysis of a data matrix containing A‐DNA and B‐DNA dinucleoside monophosphate steps: Multidimensional Ramachandran plots for nucleic acids , 1998 .

[11]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[12]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[13]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[14]  P. Geladi,et al.  Multivariate image analysis , 1996 .

[15]  C. B. Lucasius,et al.  Understanding and using genetic algorithms Part 1. Concepts, properties and context , 1993 .

[16]  A. Pyle,et al.  Stepping through an RNA structure: A novel approach to conformational analysis. , 1998, Journal of molecular biology.

[17]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[18]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[19]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[20]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[21]  Ioan Motoc,et al.  Molecular Shape Descriptors , 1983, Steric Effects in Drug Design.

[22]  Sijmen de Jong,et al.  Multiway calibration in 3D QSAR , 1997 .

[23]  Lori B. Pfahler,et al.  Lead Discovery Using Stochastic Cluster Analysis (SCA): A New Method for Clustering Structurally Similar Compounds , 1998, J. Chem. Inf. Comput. Sci..

[24]  C. B. Lucasius,et al.  Understanding and using genetic algorithms Part 2. Representation, configuration and hybridization , 1994 .

[25]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[26]  Michael Sjöström,et al.  Chemometrics, present and future success , 1998 .

[27]  Eric R. Ziegel,et al.  Handbook of Chemometrics and Qualimetrics, Part B , 2000, Technometrics.

[28]  S. Wodak,et al.  Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods. , 1998, Journal of molecular biology.

[29]  J. Rullmann,et al.  Quality assessment of NMR structures: a statistical survey. , 1998, Journal of molecular biology.

[30]  Peter S. Shenkin,et al.  Cluster analysis of molecular conformations , 1994, J. Comput. Chem..