Feature Selection Method Based on Partial Least Squares and Analysis of Traditional Chinese Medicine Data

The partial least squares method has many advantages in multivariable linear regression, but it does not include the function of feature selection. This method cannot screen for the best feature subset (referred to in this study as the “Gold Standard”) or optimize the model, although contrarily using the L1 norm can achieve the sparse representation of parameters, leading to feature selection. In this study, a feature selection method based on partial least squares is proposed. In the new method, exploiting partial least squares allows extraction of the latent variables required for performing multivariable linear regression, and this method applies the L1 regular term constraint to the sum of the absolute values of the regression coefficients. This technique is then combined with the coordinate descent method to perform multiple iterations to select a better feature subset. Analyzing traditional Chinese medicine data and University of California, Irvine (UCI), datasets with the model, the experimental results show that the feature selection method based on partial least squares exhibits preferable adaptability for traditional Chinese medicine data and UCI datasets.

[1]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[2]  Hao Huang,et al.  Safety Monitoring of a Super-High Dam Using Optimal Kernel Partial Least Squares , 2015 .

[3]  Xuan Huang 特征降维技术的研究与进展 (Research and Development of Feature Dimensionality Reduction) , 2018, 计算机科学.

[4]  Nils Lid Hjort,et al.  Fridge: Focused fine‐tuning of ridge regression for personalized predictions , 2018, Statistics in medicine.

[5]  Bin Luo,et al.  Feature selection algorithm based on multi-label ReliefF: Feature selection algorithm based on multi-label ReliefF , 2013 .

[6]  Michel Lang,et al.  A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data , 2017, Comput. Math. Methods Medicine.

[7]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[8]  R. Tibshirani,et al.  Forward stagewise regression and the monotone lasso , 2007, 0705.0269.

[9]  Li Kangshun,et al.  Entropy preserving histogram specification with adaptive brightness , 2012 .

[10]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[11]  Wael Abd-Almageed,et al.  Feature Selection using Partial Least Squares regression and optimal experiment design , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[12]  Frank M. You,et al.  Accuracy of genomic selection in biparental populations of flax (Linum usitatissimum L.) , 2016 .

[13]  Z. Hu,et al.  Maximum Margin Criterion Embedded Partial Least Square Regression for Linear and Nonlinear Discrimination , 2006, 2006 International Conference on Computational Intelligence and Security.

[14]  D. Spiegelhalter,et al.  Presentation and attrition in complex pulmonary atresia. , 1995, Journal of the American College of Cardiology.

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Yu Kui K-part Lasso based on feature selection algorithm for high-dimensional data , 2012 .

[17]  Zhu Yun A Simplified Algorithm of PLS Regression , 2000 .

[18]  Xiao Lin,et al.  [Pharmaceutical study on multi-component traditional Chinese medicines]. , 2013, Zhongguo Zhong yao za zhi = Zhongguo zhongyao zazhi = China journal of Chinese materia medica.

[19]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[20]  Peng Gao,et al.  An Effective Method of Monitoring the Large-Scale Traffic Pattern Based on RMT and PCA , 2010 .

[21]  Luo Bin Feature selection algorithm based on multi-label ReliefF , 2012 .

[22]  Bin Nie,et al.  Random Forest Regression Based on Partial Least Squares Connect Partial Least Squares and Random Forest , 2016 .

[23]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[24]  Guoli Ji,et al.  TotalPLS: Local Dimension Reduction for Multicategory Microarray Data , 2014, IEEE Transactions on Human-Machine Systems.

[25]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[26]  Yan Zhang,et al.  A feasibility research on the monitoring of traditional Chinese medicine production process using NIR-based multivariate process trajectories , 2016 .

[27]  Lei Liu,et al.  Optimization Method of Fusing Model Tree into Partial Least Squares , 2017 .

[28]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[29]  Benjamin Hofner,et al.  An Update on Statistical Boosting in Biomedicine , 2017, Comput. Math. Methods Medicine.

[30]  Hiroshi Yadohisa,et al.  Partial Least-Squares Method for Three-Mode Three-Way Datasets Based on Tucker Model , 2017 .

[31]  R Muthukrishnan,et al.  LASSO: A feature selection technique in predictive modeling for machine learning , 2016, 2016 IEEE International Conference on Advances in Computer Applications (ICACA).