On Distributed Quantization for Classification

We consider the problem of distributed feature quantization, where the goal is to enable a pretrained classifier at a central node to carry out its classification on features that are gathered from distributed nodes through communication constrained channels. We propose the design of distributed quantization schemes specifically tailored to the classification task: unlike quantization schemes that help the central node reconstruct the original signal as accurately as possible, our focus is not reconstruction accuracy, but instead correct classification. Our work does not make any a priori distributional assumptions on the data, but instead uses training data for the quantizer design. Our main contributions include: we prove NP-hardness of finding optimal quantizers in the general case; we design an optimal scheme for a special case; we propose quantization algorithms, that leverage discrete neural representations and training data, and can be designed in polynomial-time for any number of features, any number of classes, and arbitrary division of features across the distributed nodes. We find that tailoring the quantizers to the classification task can offer significant savings: as compared to alternatives, we can achieve more than a factor of two reduction in terms of the number of bits communicated, for the same classification accuracy.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Yee Whye Teh,et al.  Disentangling Disentanglement in Variational Autoencoders , 2018, ICML.

[3]  Atsushi Sato,et al.  Generalized Learning Vector Quantization , 1995, NIPS.

[4]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Lucas Theis,et al.  Lossy Image Compression with Compressive Autoencoders , 2017, ICLR.

[7]  Vivek K. Goyal,et al.  Distributed Functional Scalar Quantization Simplified , 2012, IEEE Transactions on Signal Processing.

[8]  Stefano Ermon,et al.  InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.

[9]  Kenneth Rose,et al.  A generalized VQ method for combined compression and estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Yasutada Oohama,et al.  The Rate-Distortion Function for the Quadratic Gaussian CEO Problem , 1998, IEEE Trans. Inf. Theory.

[11]  Tsachy Weissman,et al.  NECST: Neural Joint Source-Channel Coding , 2018, ArXiv.

[12]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[13]  Vinod M. Prabhakaran,et al.  Rate region of the quadratic Gaussian CEO problem , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[14]  Pasin Manurangsi,et al.  Inapproximability of Maximum Edge Biclique, Maximum Balanced Biclique and Minimum k-Cut from the Small Set Expansion Hypothesis , 2017, ICALP.

[15]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[16]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[17]  Victor B. Kazantsev,et al.  Latent Factors Limiting the Performance of sEMG-Interfaces , 2018, Sensors.

[18]  Jerry Li,et al.  Communication-Efficient Distributed Learning of Discrete Distributions , 2017, NIPS.

[19]  Miguel A. L. Nicolelis,et al.  Brain–machine interfaces: past, present and future , 2006, Trends in Neurosciences.

[20]  Valero Laparra,et al.  End-to-end optimization of nonlinear transform codes for perceptual quality , 2016, 2016 Picture Coding Symposium (PCS).

[21]  Devavrat Shah,et al.  Functional Compression Through Graph Coloring , 2010, IEEE Transactions on Information Theory.

[22]  Stefano Ermon,et al.  Neural Joint Source-Channel Coding , 2018, ICML.

[23]  Yonina C. Eldar,et al.  Deep Quantization for MIMO Channel Estimation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Vivek K. Goyal,et al.  Distributed Scalar Quantization for Computing: High-Resolution Analysis and Extensions , 2008, IEEE Transactions on Information Theory.

[25]  Joel J. P. C. Rodrigues,et al.  Wireless Sensor Networks: a Survey on Environmental Monitoring , 2011, J. Commun..

[26]  Aaron B. Wagner On Distributed Compression of Linear Functions , 2011, IEEE Transactions on Information Theory.

[27]  Jorma Laaksonen,et al.  LVQ_PAK: The Learning Vector Quantization Program Package , 1996 .

[28]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[29]  Vojtech Rödl,et al.  The Algorithmic Aspects of the Regularity Lemma , 1994, J. Algorithms.

[30]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[31]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Shuang Wu,et al.  Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[33]  Himanshu Tyagi,et al.  Distributed Simulation and Distributed Inference , 2018, Electron. Colloquium Comput. Complex..

[34]  Toby Berger,et al.  The CEO problem [multiterminal source coding] , 1996, IEEE Trans. Inf. Theory.

[35]  David Zuckerman,et al.  Electronic Colloquium on Computational Complexity, Report No. 100 (2005) Linear Degree Extractors and the Inapproximability of MAX CLIQUE and CHROMATIC NUMBER , 2005 .

[36]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[37]  Maurizio Longo,et al.  Quantization for decentralized hypothesis testing under communication constraints , 1990, IEEE Trans. Inf. Theory.

[38]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[39]  Sewoong Oh,et al.  Rate Distortion For Model Compression: From Theory To Practice , 2018, ICML.

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[42]  S. Sandeep Pradhan,et al.  Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function , 2007, IEEE Transactions on Information Theory.

[43]  Alfred O. Hero,et al.  High-rate vector quantization for detection , 2003, IEEE Trans. Inf. Theory.

[44]  J.-F. Chamberland,et al.  Wireless Sensors in Distributed Detection Applications , 2007, IEEE Signal Processing Magazine.

[45]  Farshad Firouzi,et al.  Architecting IoT Cloud , 2020 .

[46]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[47]  Shun-ichi Amari,et al.  Statistical Inference Under Multiterminal Data Compression , 1998, IEEE Trans. Inf. Theory.

[48]  Christina Fragouli,et al.  Quantizing Signals for Linear Classification , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[49]  Yanjun Han,et al.  Geometric Lower Bounds for Distributed Parameter Estimation Under Communication Constraints , 2018, IEEE Transactions on Information Theory.

[50]  Zhuowen Tu,et al.  Layered Logic Classifiers: Exploring the 'And' and 'Or' Relations , 2014, ArXiv.

[51]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[52]  Zhi-Quan Luo,et al.  Universal decentralized estimation in a bandwidth constrained sensor network , 2005, IEEE Transactions on Information Theory.

[53]  Toby Berger,et al.  The quadratic Gaussian CEO problem , 1997, IEEE Trans. Inf. Theory.

[54]  Bobak Nazer,et al.  Information-distilling quantizers , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).