Sample-expand method for predicting the specified structure of microporous aluminophosphate

Abstract Imbalanced data sets often exist in many real-world fields and this problem has got more and more attention in recent years. In this paper, a sample-expand method is proposed as data pre-processing procedure to improve the predictive performance of the zeolite synthesis on imbalance data set. First, the data pre-processing is implemented for expanding samples by exploring the marginal structure of the given data set using k -nearest neighbor algorithm (KNN). Then, the expanded data set is input to support vector machines (SVM) for classification. Finally, Q times n -fold cross-validations procedure (CVs) is adopted to assess the prediction performance. The advantage of the data pre-processing is that it can obtain stable data set for establishing the training model and abide by the classification criteria of SVM, such that the improved predictive performance is achievable. Moreover, other classical machine learning methods are also presented to accomplish the prediction task. Compared experimental results demonstrate that SVM method can reach very satisfactory predictive accuracy on the pre-processing data set. Specially, the phase diagram of gel composition is provided as a guiding role for subsequent rational synthesis experiments.

[1]  Johann Gasteiger,et al.  Structure-based predictions of 1H NMR chemical shifts of sesquiterpene lactones using neural networks , 2004 .

[2]  Matheus P Freitas,et al.  Prediction of 13C chemical shifts in methoxyflavonol derivatives using MIA-QSPR. , 2009, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[3]  Y. Wang,et al.  PRINTR: Prediction of RNA binding sites in proteins using SVM and profiles , 2008, Amino Acids.

[4]  Jianzhong Wang,et al.  Prediction of Microporous Aluminophosphate AlPO4‐5 Based on Resampling Using Partial Least Squares and Logistic Discrimination , 2010, Molecular informatics.

[5]  Bijaya K. Panigrahi,et al.  Streamflow forecasting by SVM with quantum behaved particle swarm optimization , 2013, Neurocomputing.

[6]  Hong Lin Zhai The prediction of promoter sequences based on the chemical features , 2011, Expert Syst. Appl..

[7]  高娜,et al.  基于特征选择的决策树方法在磷酸铝AlPO 4 -5定向合成中的应用 , 2011 .

[8]  J. M. Serra,et al.  Support vector machines for predictive modeling in heterogeneous catalysis: a comprehensive introduction and overfitting investigation based on two real applications. , 2006, Journal of combinatorial chemistry.

[9]  Raouf Ghavami,et al.  Structure-based predictions of 13C-NMR chemical shifts for a series of 2-functionalized 5-(methylsulfonyl)-1-phenyl-1H-indoles derivatives using GA-based MLR method , 2012 .

[10]  Jun Kong,et al.  Missing value estimation for database of aluminophosphate (AlPO) syntheses , 2013 .

[11]  Ting Gao,et al.  Improving the Accuracy of Density Functional Theory (DFT) Calculation for Homolysis Bond Dissociation Energies of Y-NO Bond: Generalized Regression Neural Network Based on Grey Relational Analysis and Principal Component Analysis , 2011, International journal of molecular sciences.

[12]  Brent M. T. Lok,et al.  Aluminophosphate molecular sieves: a new class of microporous crystalline inorganic solids , 1982 .

[13]  Jose Manuel Serra,et al.  Zeolite synthesis modelling with support vector machines: a combinatorial approach. , 2007, Combinatorial chemistry & high throughput screening.

[14]  Jun Kong,et al.  Computational prediction of the formation of microporous aluminophosphates with desired structural features , 2010 .

[15]  Manuel Moliner,et al.  Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-parametric Strategies , 2007 .

[16]  Jun Kong,et al.  Syntheses and characterizations of aluminophosphate molecular sieves AFI guided by missing value estimation on database of aluminophosphate syntheses , 2013 .

[17]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[18]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[19]  Jun Kong,et al.  A Novel Integrated Feature Selection Method for the Rational Synthesis of Microporous Aluminophosphate , 2012 .

[20]  Nicholas A. Hamilton,et al.  Fast automated cell phenotype image classification , 2007, BMC Bioinformatics.

[21]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..