Fast Quantized Arithmetic on x86: Trading Compute for Data Movement

We introduce Clover, a new library for efficient computation using low-precision data, providing mathematical routines required by fundamental methods in optimization and sparse recovery. Our library faithfully implements variants of stochastic quantization that guarantee convergence at low precision, and supports data formats from 4-bit quantized to 32-bit IEEE-754 on current Intel processors. In particular, we show that 4-bit can be implemented efficiently using Intel AVX despite the lack of native support for this data format. Experimental results with dot product, matrix-vector multiplication (MVM), gradient descent (GD), and iterative hard thresholding (IHT) demonstrate that the attainable speedups are in many cases close to linear with respect to the reduction of precision due to reduced data movement. Finally, for GD and IHT, we show examples of absolute speedup achieved by 4-bit versus 32-bit, by iterating until a given target error is achieved.

[1]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[2]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[3]  T. Blumensath,et al.  Iterative Thresholding for Sparse Approximations , 2008 .

[4]  Gustavo Alonso,et al.  FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[5]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[6]  Kunle Olukotun,et al.  Understanding and optimizing asynchronous low-precision stochastic gradient descent , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[7]  Prateek Jain,et al.  One-Bit Compressed Sensing: Provable Support and Vector Recovery , 2013, ICML.

[8]  Dan Alistarh,et al.  ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.

[9]  Mike E. Davies,et al.  Greedy algorithms for compressed sensing , 2012, Compressed Sensing.

[10]  Yaniv Plan,et al.  One‐Bit Compressed Sensing by Linear Programming , 2011, ArXiv.

[11]  Kunle Olukotun,et al.  Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms , 2015, NIPS.

[12]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[13]  Dan Alistarh,et al.  Compressive Sensing with Low Precision Data Representation: Radio Astronomy and Beyond , 2018, ArXiv.

[14]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[15]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[16]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[17]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[18]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.