Prediction of enantioselectivity using chirality codes and Classification and Regression Trees

Abstract In this paper a new application of Classification and Regression Trees, concerning the prediction of enantioselectivity, is presented. The data consists on the elution order of enantiomers separated by High-Performance Liquid Chromatography with two different chiral stationary phases. The enantiomers of both datasets were classified in two groups, named First and Last, depending on their elution order, prior to the construction of the models. Classification and Regression Trees methodology was then applied to build classification trees that allowed the prediction of the elution order of the compounds by using chirality codes as explanatory variables. The chirality codes are a set of molecular descriptors that combine different parameters and are able to distinguish between enantiomers. This new approach determined quite simple models and achieved good predictions for both datasets considered. Finally the models obtained with Classification and Regression Trees were compared with Kohonen Neural Network results. This methodology was also applied to predict the quality of the separation between two enantiomers in a certain chiral stationary phase. Previously to the construction of the model, the molecules of one of the datasets were classified in three classes (Bad, Good and Very Good), according to their degree of separation ( α ), and the model was built using the absolute values of the chirality codes. The results obtained for the final classification tree were quite promising.

[1]  W. Fabian,et al.  Quantitative structure-enantioselective retention relationships for chromatographic separation of arylalkylcarbinols on Pirkle type chiral stationary phases. , 2001, Journal of chromatography. A.

[2]  Cancelliere,et al.  Synthesis and applications of novel, highly efficient HPLC chiral stationary phases: a chiral dimension in drug research analysis. , 1999, Pharmaceutical science & technology today.

[3]  John Mingers,et al.  An Empirical Comparison of Selection Measures for Decision-Tree Induction , 1989, Machine Learning.

[4]  S. Oi,et al.  Axially dissymmetric bianthracene-based chiral stationary phase for the high-performance liquid chromatographic separation of enantiomers , 1994 .

[5]  F. Gasparrini,et al.  Study of mechanisms of chiral discrimination of amino acids and their derivatives on a teicoplanin-based chiral stationary phase. , 2004, Journal of chromatography. A.

[6]  Yi-Zeng Liang,et al.  Two-step multivariate adaptive regression splines for modeling a quantitative relationship between gas chromatography retention indices and molecular descriptors. , 2003, Journal of chromatography. A.

[7]  J. Gasteiger,et al.  ITERATIVE PARTIAL EQUALIZATION OF ORBITAL ELECTRONEGATIVITY – A RAPID ACCESS TO ATOMIC CHARGES , 1980 .

[8]  Richard G. Mathieu,et al.  A rule induction approach for determining the number of kanbans in a just-in-time production system , 1998 .

[9]  I. Wainer,et al.  Prediction of chiral chromatographic separations using combined multivariate regression and neural networks. , 1997, Analytical chemistry.

[10]  E. Delgado Predicting aqueous solubility of chlorinated hydrocarbons from molecular structure , 2002 .

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Johann Gasteiger,et al.  New Description of Molecular Chirality and Its Application to the Prediction of the Preferred Enantiomer in Stereoselective Reactions , 2001, J. Chem. Inf. Comput. Sci..

[13]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[14]  Desire L. Massart,et al.  Classification and Regression Trees-Studies of HIV Reverse Transcriptase Inhibitors , 2004, J. Chem. Inf. Model..

[15]  Ramón García-Domenech,et al.  Use of topological descriptiors in chromatographic chiral separations , 1996 .

[16]  Alain Clappier,et al.  Episode selection for ozone modelling and control strategies analysis on the Swiss Plateau , 2002 .

[17]  M. Wright,et al.  Methods for the analysis of enantiomers of racemic drugs application to pharmacological and pharmacokinetic studies. , 1993, Journal of pharmacological and toxicological methods.

[18]  R. Marshall The use of classification and regression trees in clinical epidemiology. , 2001, Journal of clinical epidemiology.

[19]  Mark R. Segal,et al.  Regression Trees for Censored Data , 1988 .

[20]  Chong Yau Fu Combining loglinear model with classification and regression tree (CART): an application to birth data , 2004, Comput. Stat. Data Anal..

[21]  Johann Gasteiger,et al.  Prediction of enantiomeric selectivity in chromatography. Application of conformation-dependent and conformation-independent descriptors of molecular chirality. , 2002, Journal of molecular graphics & modelling.

[22]  D L Massart,et al.  Classification and regression tree analysis for molecular descriptor selection and retention prediction in chromatographic quantitative structure-retention relationship studies. , 2003, Journal of chromatography. A.

[23]  J. Gasteiger,et al.  The generation of 3D models of host-guest complexes , 1992 .

[24]  Gerhard Klebe,et al.  Comparison of Automatic Three-Dimensional Model Builders Using 639 X-ray Structures , 1994, J. Chem. Inf. Comput. Sci..

[25]  D. Armstrong,et al.  High-performance liquid chromatographic and capillary electrophoretic enantioseparation of plant growth regulators and related indole compounds using macrocyclic antibiotics as chiral selectors. , 2001, Journal of chromatography. A.

[26]  Johann Gasteiger,et al.  Chirality Codes and Molecular Structure , 2004, J. Chem. Inf. Model..

[27]  A. Krstulović Chiral stationary phases for the liquid chromatographic separation of pharmaceuticals. , 1988, Journal of pharmaceutical and biomedical analysis.

[28]  J. Gasteiger,et al.  Calculation of the Charge Distribution in Conjugated Systems by a Quantification of the Resonance Concept , 1985 .

[29]  Z Lou,et al.  Tree-structured prediction for censored survival data and the Cox model. , 1995, Journal of clinical epidemiology.

[30]  D. Armstrong,et al.  High-performance liquid chromatographic separation of enantiomers of unusual amino acids on a teicoplanin chiral stationary phase. , 1998, Journal of chromatography. A.

[31]  D L Massart,et al.  Multivariate adaptive regression splines (MARS) in chromatographic quantitative structure-retention relationship studies. , 2004, Journal of chromatography. A.

[32]  B. Gilpin,et al.  Use of classification and regression tree (CART) analysis with chemical faecal indicators to determine sources of contamination , 2002 .

[33]  Anne Hersey,et al.  Quantitative relationship between rat intestinal absorption and Abraham descriptors. , 2003, European journal of medicinal chemistry.

[34]  J. Gasteiger,et al.  FROM ATOMS AND BONDS TO THREE-DIMENSIONAL ATOMIC COORDINATES : AUTOMATIC MODEL BUILDERS , 1993 .

[35]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .