Performance/Energy Optimization of DSP Transforms on the XScale Processor

The XScale processor family provides user-controllable independent configuration of CPU, bus, and memory frequencies. This feature introduces another handle for the code optimization with respect to energy consumption or runtime performance. We quantify the effect of frequency configurations on both performance and energy for three signal processing transforms: the discrete Fourier transform (DFT), finite impulse response (FIR) filters, and the Walsh-Hadamard Transform (WHT). To do this, we use SPIRAL, a program generation and optimization system for signal processing transforms. For a given transform to be implemented, SPIRAL searches over different algorithms to find the best match to the given platform with respect to the chosen performance metric (usually runtime). In this paper we use SPIRAL to generate implementations for different frequency configurations and optimize for runtime and physically measured energy consumption. In doing so we show that first, each transform achieves best performance/energy consumption for a different system configuration; second, the best code depends on the chosen configuration, problem size and algorithm; third, the fastest implementation is not always the most energy efficient; fourth, we introduce dynamic (i.e., during execution) reconfiguration in order to further improve performance/energy. Finally, we benchmark SPIRAL generated code against Intel's vendor library routines. We show competitive results as well as 20% performance improvements or energy reduction for selected transforms and problem sizes.

[1]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[2]  Sharad Malik,et al.  Compile-time dynamic voltage scaling settings: opportunities and limits , 2003, PLDI '03.

[3]  Christian Poellabauer,et al.  Monitoring of cache miss rates for accurate dynamic voltage and frequency scaling , 2005, IS&T/SPIE Electronic Imaging.

[4]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[5]  Gilberto Contreras,et al.  Power prediction for Intel XScale processors using performance monitoring unit events , 2005 .

[6]  Margaret Martonosi,et al.  Power prediction for Intel XScale/spl reg/ processors using performance monitoring unit events , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[7]  Robert A. van de Geijn,et al.  The science of deriving dense linear algebra algorithms , 2005, TOMS.

[8]  José M. F. Moura,et al.  Fast automatic software implementations of FIR filters , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Aviral Shrivastava,et al.  A customizable compiler framework for embedded systems , 2001 .

[10]  José M. F. Moura,et al.  Automatic implementation and platform adaptation of discrete filtering and wavelet algorithms , 2004 .

[11]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[12]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[13]  Ulrich Kremer,et al.  The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction , 2003, PLDI '03.

[14]  Franz Franchetti,et al.  Formal loop merging for signal transforms , 2005, PLDI '05.

[15]  Richard W. Vuduc,et al.  Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..