Fuzzy vector quantization with a step-optimizer to improve pattern classification

Abstract The paper proposes a novel supervised algorithm for data classification based on fuzzy vector quantization. The novelty here lies in including an optimization procedure to determine the step size of the quantization process. Adoption of fuzzy space for data exploration eliminates the problem of class-overlapping, while dealing with high dimensional non-linear data. In the proposed approach, the input data are first been transformed from the Euclidean space to the fuzzy space with three membership grades. Next, we have used uniform manifold approximation algorithm (UMAP), for projecting the data in visual space by selection of the most contrasting features from the fuzzy vectors. Then these features are passed through a quantization step with a novel step-optimization technique. Optimizing the quantization step and making it independent of data sets significantly speeds up the process. As the class information of all the feature vectors obtained by UMAP is known, majority voting principle has been used to locate the class-centroids in the subsequent step which in turn represent the class labels of the test samples. In test phase, after obtaining the test vectors, the Hyperspherical Direction Cosines(HDC) between the test vectors and the previously obtained class-centroids are evaluated. The test sample is finally assigned that class label where the sum of absolute differences (SAD) of these direction cosines is minimum. We have validated our classifier on various benchmark data sets and achieved higher accuracy with significantly low computation time than the existing state-of-the-art algorithms.

[1]  B. De Moor,et al.  Evaluation of Distance Metrics and Spatial Autocorrelation in Uniform Manifold Approximation and Projection Applied to Mass Spectrometry Imaging Data. , 2019, Analytical Chemistry.

[2]  Shichao Zhang,et al.  Efficient kNN classification algorithm for big data , 2016, Neurocomputing.

[3]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[4]  Robert P. W. Duin,et al.  Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Frank Klawonn,et al.  Fuzzy Clustering Based on Modified Distance Measures , 1999, IDA.

[6]  Gregory,et al.  Hyperspherical Direction Cosine Transformation for Separation of Spectral and Illumination Information in Digital Scanner Data , 2007 .

[7]  Duo-qing Wu,et al.  Comparison Between UMAP and t-SNE for Multiplex-Immunofluorescence Derived Single-Cell Data from Tissue Sections , 2019, bioRxiv.

[8]  Xiaojun Ma,et al.  Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning , 2018, Electron. Commer. Res. Appl..

[9]  H. Abdi,et al.  Principal component analysis , 2010 .

[10]  Jun Zhang,et al.  Local Energy Pattern for Texture Classification Using Self-Adaptive Quantization Thresholds , 2013, IEEE Transactions on Image Processing.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Amit Konar,et al.  Computational Intelligence: Principles, Techniques and Applications , 2005 .

[13]  Francisco Azuaje,et al.  Cluster validation techniques for genome expression data , 2003, Signal Process..

[14]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[15]  Ferenc Szeifert,et al.  Supervised fuzzy clustering for the identification of fuzzy classifiers , 2003, Pattern Recognit. Lett..

[16]  D. Grün,et al.  Revealing Dynamics of Gene Expression Variability in Cell State Space , 2019, Nature Methods.

[17]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[20]  K. P. Soman,et al.  Machine Learning with SVM and other Kernel methods , 2009 .

[21]  Iain E. G. Richardson,et al.  H.264 and MPEG-4 Video Compression: Video Coding for Next-Generation Multimedia , 2003 .

[22]  Y. Ikegaya,et al.  Machine-learning-based quality control of contractility of cultured human-induced pluripotent stem-cell-derived cardiomyocytes. , 2020, Biochemical and biophysical research communications.

[23]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[24]  Kap Luk Chan,et al.  An extended Isomap algorithm for learning multi-class manifold , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[25]  K. Mardia Some properties of clasical multi-dimesional scaling , 1978 .

[26]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[27]  G. L. Giller,et al.  The Statistical Properties of Random Bitstreams and the Sampling Distribution of Cosine Similarity , 2012 .

[28]  Fred L. Drake,et al.  Python 3 Reference Manual , 2009 .

[29]  Rudolf Kruse,et al.  Obtaining interpretable fuzzy classification rules from medical data , 1999, Artif. Intell. Medicine.

[30]  Thomas Villmann,et al.  Stochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences , 2012, Neurocomputing.

[31]  T. Subba Rao,et al.  Classification, Parameter Estimation and State Estimation: An Engineering Approach Using MATLAB , 2004 .

[32]  Barbara Hammer,et al.  Parametric nonlinear dimensionality reduction using kernel t-SNE , 2015, Neurocomputing.

[33]  Shigeru Shinomoto,et al.  A Method for Selecting the Bin Size of a Time Histogram , 2007, Neural Computation.

[34]  Jieping Ye,et al.  Two-Dimensional Linear Discriminant Analysis , 2004, NIPS.

[35]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[36]  N.B. Karayiannis,et al.  Fuzzy vector quantization algorithms and their application in image compression , 1995, IEEE Trans. Image Process..

[37]  Daniel Svozil,et al.  Introduction to multi-layer feed-forward neural networks , 1997 .

[38]  R. Maulsby Some Guidelines for Assessment of Spikes and Sharp Waves in EEG Tracings , 1971 .

[39]  Vadlamani Ravi,et al.  Bankruptcy prediction in banks and firms via statistical and intelligent techniques - A review , 2007, Eur. J. Oper. Res..

[40]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[41]  Saurabh Prasad,et al.  Limitations of Principal Components Analysis for Hyperspectral Target Recognition , 2008, IEEE Geoscience and Remote Sensing Letters.

[42]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[43]  Hisao Ishibuchi,et al.  Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[44]  Diego Bellan,et al.  Quantization theory-a deterministic approach , 1999, IEEE Trans. Instrum. Meas..

[45]  L. Shoker,et al.  Removal of eye blinking artifacts from EEG incorporating a new constrained BSS algorithm , 2004, Processing Workshop Proceedings, 2004 Sensor Array and Multichannel Signal.

[46]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[47]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[48]  Amit Konar,et al.  Fuzzy Vector Quantization for Classification of Olfactory Stimuli from the Acquired Brain Signals , 2019, 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT).

[49]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[50]  Amit Konar,et al.  A Fuzzy C Means Clustering Approach for Gesture Recognition in Healthcare , 2014 .

[51]  P. Danielsson Euclidean distance mapping , 1980 .

[52]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[53]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[54]  R. Leeb,et al.  BCI Competition 2008 { Graz data set B , 2008 .

[55]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[56]  M. Tangermann,et al.  Automatic Classification of Artifactual ICA-Components for Artifact Removal in EEG Signals , 2011, Behavioral and Brain Functions.

[57]  K Ramadoss,et al.  Automatic Identification and Removal of Ocular Artifacts from EEG using Wavelet Transform , 2006 .

[58]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[59]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..