Throughput-Distortion Computation of Generic Matrix Multiplication: Toward a Computation Channel for Digital Signal Processing Systems

The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra libraries used in many computationally demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based on dynamically adjusting the imprecision (distortion) of computation. Our technique employs adaptive scalar companding and rounding to input matrix blocks followed by two forms of packing in floating-point that allow for concurrent calculation of multiple results. Since the adaptive companding process controls the increase of concurrency (via packing), the increase in processing throughput (and the corresponding increase in distortion) depends on the input data statistics. To demonstrate this, we derive the optimal throughput-distortion control framework for GEMM for the broad class of zero-mean, independent identically distributed, input sources. Our approach converts matrix multiplication in programmable processors into a computation channel: when increasing the processing throughput, the output noise (error) increases due to: (i) coarser quantization; and (ii) computational errors caused by exceeding the machine-precision limitations. We show that, under certain distortion in the GEMM computation, the proposed framework can significantly surpass 100% of the peak performance of a given processor. The practical benefits of our proposal are shown in a face recognition system and a multilayer perceptron system trained for metadata learning from a large music feature database.

[1]  James Demmel,et al.  Using PHiPAC to speed error back-propagation learning , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Anantha P. Chandrakasan,et al.  Low-power digital filtering using approximate processing , 1996 .

[3]  Jorge Stolfi,et al.  Affine Arithmetic: Concepts and Applications , 2004, Numerical Algorithms.

[4]  Douglas L. Jones,et al.  Stochastic computation , 2010, Design Automation Conference.

[5]  David Lammers,et al.  The Era of Error-Tolerant Computing , 2010 .

[6]  Yiannis Andreopoulos,et al.  Throughput-precision computation for generic matrix multiplication: Toward a computation channel for high-performance digital signal processing , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[7]  Jack J. Dongarra,et al.  Implementation of mixed precision in solving systems of linear equations on the Cell processor , 2007, Concurr. Comput. Pract. Exp..

[8]  Raphael Yuster,et al.  Fast sparse matrix multiplication , 2004, TALG.

[9]  Yiannis Andreopoulos,et al.  Software designs of image processing tasks with incremental refinement of computation , 2009, 2009 IEEE Workshop on Signal Processing Systems.

[10]  Shen-Chuan Tai,et al.  Fast full-search block-matching algorithm for motion-compensated video compression , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[11]  Yu Cao,et al.  A resilience roadmap , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[12]  Alexander Kadyrov,et al.  The "Invaders' Algorithm: Range of Values Modulation for Accelerated Correlation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Nancy S. Pollard,et al.  Local layering , 2009, SIGGRAPH 2009.

[14]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[15]  Alejandro F. Frangi,et al.  Two-dimensional PCA: a new approach to appearance-based face representation and recognition , 2004 .

[16]  Pierre Duhamel,et al.  Impulsive noise cancellation in multicarrier transmission , 2005, IEEE Transactions on Communications.

[17]  Yiannis Andreopoulos,et al.  Linear Image Processing Operations With Operational Tight Packing , 2010, IEEE Signal Processing Letters.

[18]  Robert A. van de Geijn,et al.  Anatomy of high-performance matrix multiplication , 2008, TOMS.

[19]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[20]  Qian Du,et al.  Hyperspectral Image Compression Using JPEG2000 and Principal Component Analysis , 2007, IEEE Geoscience and Remote Sensing Letters.

[21]  Jian Yang,et al.  Two-dimensional PCA: a new approach to appearance-based face representation and recognition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Glenn Reinman,et al.  Fool me twice: Exploring and exploiting error tolerance in physics-based animation , 2009, TOGS.