Support vector machine and the heuristic method to predict the solubility of hydrocarbons in electrolyte.

A new method support vector machine (SVM) and the heuristic method (HM) were used to develop nonlinear and linear models between the solubility in electrolyte containing sodium chloride and three molecular descriptors of 217 nonelectrolytes. The molecular descriptors representing the structural features of the compounds include two topological and one electrostatic descriptor. The three molecular descriptors selected by HM in CODESSA were used as inputs for SVM. The results obtained by HM and SVM both were satisfactory. The model of HM leads to a correlation coefficient (R) of 0.980 and root-mean-square error (RMS) of 0.219 for the test set. The same descriptors were also employed to build the model in pure water, and the prediction results were consistent with the experimental solubilities. Furthermore, a predictive correlation coefficient R = 0.988 and RMS error of 0.170 for the test set were obtained by SVM. The prediction results are in very good agreement with the experimental values. This paper provides a new and effective method for predicting the solubility in electrolyte and reveals some insight into the structural features that are related to the noneletrolytes.