Optimal Selection of Support Vector Regression Parameters and Molecular Descriptors for Retention Indices Prediction

The quantitative structure-retention relationship (QSRR) was used for the prediction of retention indices of compounds in gas chromatography. 252 compounds containing boiling points (BP) was extracted from Molecular Operating Environment (MOE) database. After calculation of molecular descriptors of all compounds, genetic algorithm (GA) was used to select an optimal subset of the molecular descriptors. We investigated the predictive performance of four methods: GA on MLR (GA-MLR), the subset selected by GA-MLR was used to train SVR (GA-MLR-SVR), GA on SVR (GA-SVR) and GA on SVR with optimizing parameters (GA-SVR-Para). Twenty in-silicon experiments were conducted on each method. The experimental results show that the GA-SVR and GA-SVR-Para have better predictive performance with small variations. Among these four QSRR models, GA-SVR-Para achieved the best performance with a R2 > 0.98.

[1]  L. Buydens,et al.  Determination of optimal support vector regression parameters by genetic algorithms and simplex optimization , 2005 .

[2]  T. Hancock,et al.  A performance comparison of modern statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies , 2005 .

[3]  Lars I. Nord,et al.  Prediction of liquid chromatographic retention times of steroids by three-dimensional structure descriptors and partial least squares modeling , 1998 .

[4]  E. Kováts,et al.  GAS CHROMATOGRAPHISCHE CHARAKTERISIERUNG ORGANISCHER VERBINDUNGEN , 1958 .

[5]  Feng Luan,et al.  Prediction of retention time of a variety of volatile organic compounds based on the heuristic method and support vector machine , 2005 .

[6]  Bahram Hemmateenejad,et al.  Quantitative structure-retention relationship for the Kovats retention indices of a large set of terpenes: a combined data splitting-feature selection strategy. , 2007, Analytica chimica acta.

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[9]  E. Kováts,et al.  Gas‐chromatographische Charakterisierung organischer Verbindungen. Teil 1: Retentionsindices aliphatischer Halogenide, Alkohole, Aldehyde und Ketone , 1958 .

[10]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[11]  K. Héberger Quantitative structure-(chromatographic) retention relationships. , 2007, Journal of chromatography. A.

[12]  Y. L. Loukas,et al.  Artificial neural networks in liquid chromatography: efficient and improved quantitative structure-retention relationship models. , 2000, Journal of chromatography. A.

[13]  Tobias Kind,et al.  Use of boiling point-Lee retention index correlation for rapid review of gas chromatography-mass spectrometry data , 2003 .

[14]  H. Vandendool,et al.  A GENERALIZATION OF THE RETENTION INDEX SYSTEM INCLUDING LINEAR TEMPERATURE PROGRAMMED GAS-LIQUID PARTITION CHROMATOGRAPHY. , 1963, Journal of chromatography.

[15]  Zhide Hu,et al.  QSPR prediction of GC retention indices for nitrogen-containing polycyclic aromatic compounds from heuristically computed molecular descriptors. , 2005, Talanta.

[16]  Roeland C. H. J. van Ham,et al.  Automated procedure for candidate compound selection in GC-MS metabolomics based on prediction of Kovats retention index , 2009, Bioinform..

[17]  M. Jalali-Heravi,et al.  Artificial neural network modeling of Kováts retention indices for noncyclic and monocyclic terpenes. , 2001, Journal of chromatography. A.