Low-Rank Kernel Space Representations in Prototype Learning

In supervised learning feature vectors are often implicitly mapped to a high-dimensional space using the kernel trick with quadratic costs for the learning algorithm. The recently proposed random Fourier features provide an explicit mapping such that classical algorithms with often linear complexity can be applied. Yet, the random Fourier feature approach remains widely complex techniques which are difficult to interpret. Using Matrix Relevance Learning the linear mapping of the data for a better class separation can be learned by adapting a parametric Euclidean distance. Further, a low-rank representation of the input data can be obtained. We apply this technique to random Fourier feature encoded data to obtain a discriminative mapping of the kernel space. This explicit approach is compared with a differentiable kernel vector quantizer on the same but implicit kernel representation. Using multiple benchmark problems, we demonstrate that a parametric distance on a RBF encoding yields to better classification results and permits access to interpretable prediction models with visualization abilities.

[1]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[2]  Michael Biehl,et al.  Urine Steroid Metabolomics as a Biomarker Tool for Detecting Malignancy in Adrenal Tumors , 2011, The Journal of clinical endocrinology and metabolism.

[3]  Thomas Villmann,et al.  Generalized relevance learning vector quantization , 2002, Neural Networks.

[4]  Michael Biehl,et al.  Adaptive learning for complex-valued data , 2012, ESANN.

[5]  Thomas Villmann,et al.  Comparison of relevance learning vector quantization with other metric adaptive classification methods , 2006, Neural Networks.

[6]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[7]  Thomas Villmann,et al.  Kernelized vector quantization in gradient-descent learning , 2015, Neurocomputing.

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  Thomas Villmann,et al.  Similarity-Based Clustering, Recent Developments and Biomedical Applications [outcome of a Dagstuhl Seminar] , 2009, Similarity-Based Clustering.

[10]  Michael Biehl,et al.  Adaptive Relevance Matrices in Learning Vector Quantization , 2009, Neural Computation.

[11]  Thomas Villmann,et al.  Generalized matrix learning vector quantizer for the analysis of spectral data , 2008, ESANN.

[12]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[13]  Thomas Villmann,et al.  Stationarity of Matrix Relevance LVQ , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[14]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[15]  Thomas Villmann,et al.  Large margin linear discriminative visualization by Matrix Relevance Learning , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[16]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[17]  Alexander Sczyrba,et al.  Nonlinear Dimensionality Reduction for Cluster Identification in Metagenomic Samples , 2013, 2013 17th International Conference on Information Visualisation.

[18]  Frank-Michael Schleif,et al.  Supervised Attribute Relevance Determination for Protein Identification in Stress Experiments , 2007 .

[19]  Barbara Hammer,et al.  Relevance determination in Learning Vector Quantization , 2001, ESANN.

[20]  Atsushi Sato,et al.  Generalized Learning Vector Quantization , 1995, NIPS.

[21]  Thomas Villmann,et al.  Limited Rank Matrix Learning, discriminative dimension reduction and visualization , 2012, Neural Networks.

[22]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[23]  Thomas Villmann,et al.  Differentiable Kernels in Generalized Matrix Learning Vector Quantization , 2012, 2012 11th International Conference on Machine Learning and Applications.

[24]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[25]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[26]  Thomas Villmann,et al.  Prototype based fuzzy classification in clinical proteomics , 2008, Int. J. Approx. Reason..

[27]  Koby Crammer,et al.  Margin Analysis of the LVQ Algorithm , 2002, NIPS.

[28]  Timo Honkela,et al.  BIBLIOGRAPHY OF SELF-ORGANIZING MAP (SOM) PAPERS: 2002-2005 ADDENDUM , 2009 .

[29]  Ata Kabán,et al.  Random projections versus random selection of features for classification of high dimensional data , 2013, 2013 13th UK Workshop on Computational Intelligence (UKCI).

[30]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[31]  Thomas Villmann,et al.  Aspects in Classification Learning - Review of Recent Developments in Learning Vector Quantization , 2014 .

[32]  M. Mendenhall,et al.  Relevance-Based Feature Extraction for Hyperspectral Images , 2008, IEEE Transactions on Neural Networks.

[33]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.