Prototype-based Neural Network Layers: Incorporating Vector Quantization

Neural networks currently dominate the machine learning community and they do so for good reasons. Their accuracy on complex tasks such as image classification is unrivaled at the moment and with recent improvements they are reasonably easy to train. Nevertheless, neural networks are lacking robustness and interpretability. Prototype-based vector quantization methods on the other hand are known for being robust and interpretable. For this reason, we propose techniques and strategies to merge both approaches. This contribution will particularly highlight the similarities between them and outline how to construct a prototype-based classification layer for multilayer networks. Additionally, we provide an alternative, prototype-based, approach to the classical convolution operation. Numerical results are not part of this report, instead the focus lays on establishing a strong theoretical framework. By publishing our framework and the respective theoretical considerations and justifications before finalizing our numerical experiments we hope to jump-start the incorporation of prototype-based learning in neural networks and vice versa.

[1]  Georges Voronoi Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Deuxième mémoire. Recherches sur les parallélloèdres primitifs. , 1908 .

[2]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[3]  T. Villmann,et al.  Learning Vector Quantization Capsules , 2018 .

[4]  Thomas Villmann,et al.  Distance Measures for Prototype Based Classification , 2013, BrainComp.

[5]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[6]  Thomas Villmann,et al.  Divergence-based classification in learning vector quantization , 2011, Neurocomputing.

[7]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[8]  A. Kai Qin,et al.  Initialization insensitive LVQ algorithm based on cost-function adaptation , 2005, Pattern Recognit..

[9]  Atsushi Sato,et al.  Generalized Learning Vector Quantization , 1995, NIPS.

[10]  R. Hecht-Nielsen Counterpropagation networks. , 1987, Applied optics.

[11]  Thomas Villmann,et al.  Dropout in Learning Vector Quantization Networks for Regularized Learning and Classification Confidence Estimation , 2018 .

[12]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[13]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[14]  Michael Biehl,et al.  Adaptive Relevance Matrices in Learning Vector Quantization , 2009, Neural Computation.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Robert Hecht-Nielsen,et al.  Applications of counterpropagation networks , 1988, Neural Networks.

[17]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[18]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[19]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[20]  Allen Gersho,et al.  Competitive learning and soft competition for vector quantizer design , 1992, IEEE Trans. Signal Process..

[21]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[22]  Luca Benini,et al.  Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations , 2017, NIPS.

[23]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[24]  Heiko Wersing,et al.  Efficient rejection strategies for prototype-based classification , 2015, Neurocomputing.

[25]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[26]  Thomas Villmann,et al.  Adaptive tangent distances in generalized learning vector quantization for transformation and distortion invariant classification learning , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[27]  Thomas Villmann,et al.  Adaptive Hausdorff Distances and Tangent Distance Adaptation for Transformation Invariant Classification Learning , 2016, ICONIP.

[28]  Arvind Satyanarayan,et al.  The Building Blocks of Interpretability , 2018 .

[29]  Paul L. Zador,et al.  Asymptotic quantization error of continuous signals and the quantization dimension , 1982, IEEE Trans. Inf. Theory.

[30]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[31]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Thomas Villmann,et al.  Kernelized vector quantization in gradient-descent learning , 2015, Neurocomputing.

[33]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[34]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[35]  Klaus Obermayer,et al.  Soft Learning Vector Quantization , 2003, Neural Computation.

[36]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[37]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[38]  Thomas Villmann,et al.  Prototype-based models in machine learning. , 2016, Wiley interdisciplinary reviews. Cognitive science.

[39]  Thomas Villmann,et al.  Generative versus Discriminative Prototype Based Classification , 2014, WSOM.

[40]  Teuvo Kohonen,et al.  Learning vector quantization , 1998 .

[41]  R. Gray,et al.  Combining Image Compression and Classification Using Vector Quantization , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Thomas Villmann,et al.  Probabilistic Learning Vector Quantization with Cross-Entropy for Probabilistic Class Assignments in Classification Learning , 2018, ICAISC.

[43]  Kiichi Urahama,et al.  Gradient descent learning of nearest neighbor classifiers with outlier rejection , 1995, Pattern Recognit..

[44]  C. Malsburg,et al.  How patterned neural connections can be set up by self-organization , 1976, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[45]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[46]  Thomas Villmann,et al.  Regularization in Matrix Relevance Learning , 2010, IEEE Transactions on Neural Networks.

[47]  Thomas Villmann,et al.  Prototype Based Classification Using Information Theoretic Learning , 2006, ICONIP.

[48]  Radu Herbei,et al.  Classification with reject option , 2006 .

[49]  Robert P. W. Duin,et al.  The interaction between classification and reject performance for distance-based reject-option classifiers , 2006, Pattern Recognit. Lett..

[50]  Thomas Villmann,et al.  Aspects in Classification Learning - Review of Recent Developments in Learning Vector Quantization , 2014 .

[51]  Michael Biehl,et al.  Dynamics and Generalization Ability of LVQ Algorithms , 2007, J. Mach. Learn. Res..

[52]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[53]  Matthias Hein,et al.  Provable Robustness of ReLU networks via Maximization of Linear Regions , 2018, AISTATS.

[54]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[55]  Thomas Villmann,et al.  Direct Incorporation of L_1 -Regularization into Generalized Matrix Learning Vector Quantization , 2018, ICAISC.

[56]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[57]  Robert P. W. Duin,et al.  Growing a multi-class classifier with a reject option , 2008, Pattern Recognit. Lett..

[58]  Ming Yuan,et al.  Classification Methods with Reject Option Based on Convex Risk Minimization , 2010, J. Mach. Learn. Res..

[59]  Anil K. Jain,et al.  Reject option for VQ-based Bayesian classification , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[60]  Alberto Muñoz,et al.  Self-organizing maps for outlier detection , 1998, Neurocomputing.

[61]  Michael Biehl,et al.  Distance Learning in Discriminative Vector Quantization , 2009, Neural Computation.

[62]  Thomas Villmann,et al.  Self-Adjusting Reject Options in Prototype Based Classification , 2016, WSOM.

[63]  James C. Bezdek,et al.  Relational duals of the c-means clustering algorithms , 1989, Pattern Recognit..

[64]  Thomas Villmann,et al.  Regularization and improved interpretation of linear data mappings and adaptive distance measures , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[65]  Fei Yin,et al.  Robust Classification with Convolutional Prototype Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Thomas Villmann,et al.  Divergence-Based Vector Quantization , 2011, Neural Computation.

[67]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[68]  Takeshi Nishida,et al.  An analysis of competitive and re-initialization learning for adaptive vector quantization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[69]  Thomas Villmann,et al.  Types of (dis-)similarities and adaptive mixtures thereof for improved classification learning , 2017, Neurocomputing.

[70]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[71]  T. Villmann,et al.  Learning matrix quantization and relevance learning based on Schatten-p-norms , 2016, Neurocomputing.

[72]  Klaus Obermayer,et al.  Soft nearest prototype classification , 2003, IEEE Trans. Neural Networks.

[73]  James C. Bezdek,et al.  Nerf c-means: Non-Euclidean relational fuzzy clustering , 1994, Pattern Recognit..

[74]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[75]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Thomas Villmann Learning Vector Quantization Methods for Interpretable Classification Learning and Multilayer Networks , 2018, IJCCI.

[77]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[78]  Aaron C. Courville,et al.  Deep Learning Vector Quantization , 2016, ESANN.

[79]  Quoc V. Le,et al.  Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[80]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[81]  Mario Vento,et al.  To reject or not to reject: that is the question-an answer in case of neural classifiers , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[82]  Thomas Villmann,et al.  Fusion of deep learning architectures, multilayer feedforward networks and learning vector quantizers for deep classification learning , 2017, 2017 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM).

[83]  Michael Biehl,et al.  Equilibrium Properties of Offline LVQ , 2009, ESANN 2009.

[84]  Koby Crammer,et al.  Margin Analysis of the LVQ Algorithm , 2002, NIPS.

[85]  Francesco Visin,et al.  A guide to convolution arithmetic for deep learning , 2016, ArXiv.

[86]  Martin E. Hellman,et al.  The Nearest Neighbor Classification Rule with a Reject Option , 1970, IEEE Trans. Syst. Sci. Cybern..

[87]  Thomas Villmann,et al.  Median variants of learning vector quantization for learning of dissimilarity data , 2015, Neurocomputing.

[88]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[89]  T. Heskes Energy functions for self-organizing maps , 1999 .

[90]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[91]  M. Omair Ahmad,et al.  Competitive splitting for codebook initialization , 2004, IEEE Signal Processing Letters.