Prediction of Microporous Aluminophosphate AlPO4‐5 Based on Resampling Using Partial Least Squares and Logistic Discrimination

In this paper, Partial Least Squares (PLS) regression and Logistic Discrimination (LD) are employed to predict the formation of microporous aluminophosphate AlPO4‐5 based on the database of AlPO synthesis, which aims to provide a useful guidance to the rational synthesis of microporous materials as well as other inorganic crystalline materials. To deal with the problem of class imbalance, four guided resampling methods considering not only the between‐class imbalance but also the within‐class imbalance are proposed. Experimental results indicate that the presented methods are competent for predicting the formation of microporous aluminophosphate AlPO4‐5. Specially, compared with some existing resampling methods, our proposed resampling methods exhibit much better predictive results.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  R. Bell,et al.  Molecular sieve catalysts for the regioselective and shape- selective oxyfunctionalization of alkanes in air. , 2001, Accounts of chemical research.

[3]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[4]  Brent M. T. Lok,et al.  Silicoaluminophosphate molecular sieves: another new class of microporous crystalline inorganic solids , 1984 .

[5]  Joseph V. Smith,et al.  Crystal Structure of Tetrapropylammonium Hydroxide—Aluminum Phosphate Number 5 , 1983 .

[6]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[7]  B. Marx Iteratively reweighted partial least squares estimation for generalized linear regression , 1996 .

[8]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[9]  Lu Tian,et al.  Linking gene expression data with patient survival times using partial least squares , 2002, ISMB.

[10]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[11]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[12]  GuoHongyu,et al.  Learning from imbalanced data sets with boosting and data generation , 2004 .

[13]  Rolf Sundberg Small sample and selection bias effects in calibration under latent factor regression models , 2007 .

[14]  Jihong Yu,et al.  Insight into the construction of open-framework aluminophosphates. , 2006, Chemical Society reviews.

[15]  Thomas J. McAvoy,et al.  Nonlinear PLS Modeling Using Neural Networks , 1992 .

[16]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[17]  Mark E. Davis Ordered porous materials for emerging applications , 2002, Nature.

[18]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[19]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[20]  Daniel Peña,et al.  A robust partial least squares regression method with applications , 2009 .

[21]  Edward C. Malthouse,et al.  Nonlinear partial least squares , 1997 .

[22]  Jianzhong Li,et al.  The impact of sample imbalance on identifying differentially expressed genes , 2006, BMC Bioinformatics.

[23]  Jihong Yu,et al.  Rich structure chemistry in the aluminophosphate family. , 2003, Accounts of chemical research.

[24]  Sauchi Stephen Lee Noisy replication in skewed binary classification , 2000 .

[25]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[26]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[27]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[28]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[29]  Jun Kong,et al.  Computational prediction of the formation of microporous aluminophosphates with desired structural features , 2010 .

[30]  Brent M. T. Lok,et al.  Aluminophosphate molecular sieves: a new class of microporous crystalline inorganic solids , 1982 .

[31]  J. Riu,et al.  Assessing the accuracy of analytical methods using linear regression with errors in both axes. , 1996, Analytical chemistry.

[32]  Lucila Ohno-Machado,et al.  Approximation properties of haplotype tagging , 2006, BMC Bioinformatics.

[33]  Wei Pan,et al.  Linear regression and two-class classification with gene expression data , 2003, Bioinform..