Improving k Nearest Neighbors and Naïve Bayes Classifiers Through Space Transformations and Model Selection

Improving classifiers’ performance is the goal of techniques like prototype selection, normalization, and feature mapping; these techniques aim to reduce the complexity and improve the accuracy of models. In this manuscript, we present a boosting artifact for the well-known single-label <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Nearest Neighbors (<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-NN) classifier and the Naïve Bayes (NB) classifier. The improvement comes as a pipeline that includes several data transformations orchestrated by a model selection scheme. The construction of these classifiers relies on the composition of simpler parts found in several open-source libraries and can be effortlessly put together to replicate our proposal. We also explore ensembling and the effect of preprocessing and normalizing the data. We compare our approach experimentally with 17 popular classifiers using raw and rank-based scores on 34 different benchmarks; statistical tests support our results. For instance, our results regarding average performance ranks under balanced error rate show that the models created with our proposal achieve first, third, and fourth-best ranks, compared with 10th position of raw <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-NN and 14th of raw NB.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Kai Zhang,et al.  Density-Weighted Nyström Method for Computing Large Kernel Eigensystems , 2009, Neural Comput..

[3]  Fernando Ortega,et al.  A Collaborative Filtering Approach Based on Naïve Bayes Classifier , 2019, IEEE Access.

[4]  Siyang Wang,et al.  A new Centroid-Based Classification model for text categorization , 2017, Knowl. Based Syst..

[5]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[6]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ying Shen,et al.  Enhancing ontology-driven diagnostic reasoning with a symptom-dependency-aware Naïve Bayes classifier , 2019, BMC Bioinformatics.

[8]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[9]  Amir F. Atiya,et al.  Self-generating prototypes for pattern classification , 2007, Pattern Recognit..

[10]  E. Nyström Über Die Praktische Auflösung von Integralgleichungen mit Anwendungen auf Randwertaufgaben , 1930 .

[11]  Ying Wei,et al.  Subcortical Brain Segmentation Based on Atlas Registration and Linearized Kernel Sparse Representative Classifier , 2019, IEEE Access.

[12]  Chidchanok Lursinsap,et al.  A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm , 2005, Pattern Recognit. Lett..

[13]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[14]  Jihong Ouyang,et al.  A kernel-based centroid classifier using hypothesis margin , 2016, J. Exp. Theor. Artif. Intell..

[15]  Tin Kam Ho,et al.  How Complex is your classification problem? A survey on measuring classification complexity , 2018 .

[16]  Yongheng Yang,et al.  Hotspot diagnosis for solar photovoltaic modules using a Naive Bayes classifier , 2019, Solar Energy.

[17]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Azuraliza Abu Bakar,et al.  Hybrid N-gram model using Naïve Bayes for classification of political sentiments on Twitter , 2019, Neural Computing and Applications.

[19]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[20]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[21]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[22]  Seong Joon Yoo,et al.  Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews , 2012, Expert Syst. Appl..

[23]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[24]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[25]  Bianca Zadrozny,et al.  Categorizing feature selection methods for multi-label classification , 2016, Artificial Intelligence Review.

[26]  Ruili Wang,et al.  Feature selection for least squares projection twin support vector machine , 2014, Neurocomputing.

[27]  Xinwang Liu,et al.  Efficient Multiple Kernel k-Means Clustering With Late Fusion , 2019, IEEE Access.

[28]  Tinghua Wang,et al.  Bridging deep and multiple kernel learning: A review , 2021, Inf. Fusion.

[29]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[31]  Mohammad Ali Zare Chahooki,et al.  A kernelized non-parametric classifier based on feature ranking in anisotropic Gaussian kernel , 2017, Neurocomputing.

[32]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[33]  Ankita Dhar,et al.  Text categorization: past and present , 2020, Artificial Intelligence Review.

[34]  Simone Scardapane,et al.  Kafnets: kernel-based non-parametric activation functions for neural networks , 2017, Neural Networks.

[35]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[36]  Hongwei Li,et al.  Class-Specific Deep Feature Weighting for Naïve Bayes Text Classifiers , 2020, IEEE Access.

[37]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[38]  Hanqing Lu,et al.  Improving kernel Fisher discriminant analysis for face recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[39]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[40]  José Salvador Sánchez,et al.  High training set size reduction by space partitioning and prototype abstraction , 2004, Pattern Recognit..

[41]  Eric Sadit Tellez,et al.  Extreme pivots: a pivot selection strategy for faster metric search , 2019, Knowledge and Information Systems.

[42]  Chih-Fong Tsai,et al.  SVM and SVM Ensembles in Breast Cancer Prediction , 2017, PloS one.

[43]  Jyoti Maggu,et al.  Kernel transform learning , 2017, Pattern Recognit. Lett..

[44]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[45]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[46]  Nikhitha K. Nair,et al.  Deep kernel learning in extreme learning machines , 2020, Pattern Analysis and Applications.

[47]  Uwe Aickelin,et al.  Comparison of Distance metrics for hierarchical data in medical databases , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[48]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[49]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[50]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[51]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[52]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[53]  Ammar Ismael Kadhim Survey on supervised machine learning techniques for automatic text classification , 2019, Artificial Intelligence Review.

[54]  Murat Can Ganiz,et al.  Semantic text classification: A survey of past and recent advances , 2018, Inf. Process. Manag..

[55]  Jorge E. Pezoa,et al.  CREGEX: A Biomedical Text Classifier Based on Automatically Generated Regular Expressions , 2020, IEEE Access.

[56]  Jian Wu,et al.  Multi-Label Active Learning Algorithms for Image Classification , 2020, ACM Comput. Surv..

[57]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[58]  Yiannis Kompatsiaris,et al.  Deep Learning Advances in Computer Vision with 3D Data , 2017, ACM Comput. Surv..

[59]  Shusen Wang,et al.  Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds , 2017, J. Mach. Learn. Res..

[60]  Li He,et al.  Kernel K-Means Sampling for Nyström Approximation , 2018, IEEE Transactions on Image Processing.

[61]  T. Sauer,et al.  Local Kernels and the Geometric Structure of Data , 2014, 1407.1426.

[62]  Richard A. Johnson,et al.  A new family of power transformations to improve normality or symmetry , 2000 .

[63]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[64]  Xiangliang Zhang,et al.  An up-to-date comparison of state-of-the-art classification algorithms , 2017, Expert Syst. Appl..

[65]  Hamid Abrishami Moghaddam,et al.  Nonparametric kernel sparse representation-based classifier , 2017, Pattern Recognit. Lett..

[66]  Martin Aumüller,et al.  ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms , 2018, SISAP.

[67]  Peng Peng,et al.  Real-time road traffic state prediction based on kernel-KNN , 2018, Transportmetrica A: Transport Science.

[68]  Burkhard Morgenstern,et al.  The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances , 2020, PloS one.

[69]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[70]  James T. Kwok,et al.  Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction , 2010, IEEE Transactions on Neural Networks.

[71]  Ivor W. Tsang,et al.  A Family of Simple Non-Parametric Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[72]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[73]  Mario Graff,et al.  INGEOTEC at SemEval 2017 Task 4: A B4MSA Ensemble based on Genetic Programming for Twitter Sentiment Analysis , 2017, SemEval@ACL.