Orthogonal Least Squares Based Fast Feature Selection for Linear Classification

An Orthogonal Least Squares (OLS) based feature selection method is proposed for both binomial and multinomial classification. The novel Squared Orthogonal Correlation Coefficient (SOCC) is defined based on Error Reduction Ratio (ERR) in OLS and used as the feature ranking criterion. The equivalence between the canonical correlation coefficient, Fisher’s criterion, and the sum of the SOCCs is revealed, which unveils the statistical implication of ERR in OLS for the first time. It is also shown that the OLS based feature selection method has speed advantages when applied for greedy search. The proposed method is comprehensively compared with the mutual information based feature selection methods in 2 synthetic and 7 real world datasets. The results show that the proposed method is always in the top 5 among the 10 candidate methods. Besides, the proposed method can be directly applied to continuous features without discretisation, which is another significant advantage over mutual information based methods.

[1]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[2]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[3]  M. J. Korenberg,et al.  A robust orthogonal algorithm for system identification and time-series analysis , 1989, Biological Cybernetics.

[4]  Sheng Chen,et al.  Orthogonal least squares methods and their application to non-linear system identification , 1989 .

[5]  Laith Abualigah,et al.  Multi-verse optimizer algorithm: a comprehensive survey of its results, variants, and applications , 2020, Neural Computing and Applications.

[6]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[7]  Robert P. W. Duin,et al.  Handwritten digit recognition by combined classifiers , 1998, Kybernetika.

[8]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[9]  Laith Mohammad Abualigah,et al.  Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering , 2018, Studies in Computational Intelligence.

[10]  Max A. Little,et al.  Suitability of Dysphonia Measurements for Telemonitoring of Parkinson's Disease , 2008, IEEE Transactions on Biomedical Engineering.

[11]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[12]  Daniel Fryer,et al.  Shapley values for feature selection: The good, the bad, and the axioms , 2021, IEEE Access.

[13]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[14]  Elias Oliveira,et al.  Agglomeration and Elimination of Terms for Dimensionality Reduction , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[15]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[16]  Ping Zhang,et al.  Distinguishing two types of labels for multi-label feature selection , 2019, Pattern Recognit..

[17]  Oksam Chae,et al.  Simultaneous feature selection and discretization based on mutual information , 2019, Pattern Recognit..

[18]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[19]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[20]  Xianggui Qu,et al.  Multivariate Data Analysis , 2007, Technometrics.

[21]  Stephen A. Billings,et al.  A novel logistic-NARX model as a classifier for dynamic binary classification , 2017, Neural Computing and Applications.

[22]  Songcan Chen,et al.  Class label versus sample label-based CCA , 2007, Appl. Math. Comput..

[23]  Eamonn J. Keogh,et al.  The UCR time series archive , 2018, IEEE/CAA Journal of Automatica Sinica.

[24]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[25]  M. Korenberg Identifying nonlinear difference equation and functional expansion representations: The fast orthogonal algorithm , 2006, Annals of Biomedical Engineering.

[26]  Laith Mohammad Abualigah,et al.  A new feature selection method to improve the document clustering using particle swarm optimization algorithm , 2017, J. Comput. Sci..

[27]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[28]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[29]  L. Shapley A Value for n-person Games , 1988 .

[30]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[31]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[32]  Tao Li,et al.  Recent advances in feature selection and its applications , 2017, Knowledge and Information Systems.

[33]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[34]  Hossein Nezamabadi-pour,et al.  A label-specific multi-label feature selection algorithm based on the Pareto dominance concept , 2019, Pattern Recognit..

[35]  Stephen A. Billings,et al.  Feature Subset Selection and Ranking for Data Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Ping Zhang,et al.  Multi-label feature selection with shared common mode , 2020, Pattern Recognit..

[37]  José Antonio Lozano,et al.  Mutual information based feature subset selection in multivariate time series classification , 2020, Pattern Recognit..

[38]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Rui Zhang,et al.  A novel feature selection method considering feature interaction , 2015, Pattern Recognit..

[40]  Mohammed Azmi Al-Betar,et al.  Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering , 2017, Expert Syst. Appl..

[41]  Olvi L. Mangasarian,et al.  Nuclear feature extraction for breast tumor diagnosis , 1993, Electronic Imaging.

[42]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[43]  LiuHuan,et al.  Recent advances in feature selection and its applications , 2017 .