Optimized Fundamental Signal Processing Operations For Energy Minimization on Heterogeneous Mobile Devices

Numerous signal processing applications are emerging on both mobile and high-performance computing systems. These applications are subject to responsiveness constraints for user interactivity and, at the same time, must be optimized for energy efficiency. The increasingly heterogeneous power-versus-performance profile of modern hardware introduces new opportunities for energy savings as well as challenges. In this line, recent systems-on-chip (SoC) composed of low-power multicore processors, combined with a small graphics accelerator (or GPU), yield a notable increment of the computational capacity while partially retaining the appealing low power consumption of embedded systems. This paper analyzes the potential of these new hardware systems to accelerate applications that involve a large number of floating-point arithmetic operations mainly in the form of convolutions. To assess the performance, a headphone-based spatial audio application for mobile devices based on a Samsung Exynos 5422 SoC has been developed. We discuss different implementations and analyze the tradeoffs between performance and energy efficiency for different scenarios and configurations. Our experimental results reveal that we can extend the battery lifetime of a device featuring such an architecture by a 238% by properly configuring and leveraging the computational resources.

[1]  Wen Gao,et al.  High definition IEEE AVS decoder on ARM NEON platform , 2013, 2013 IEEE International Conference on Image Processing.

[2]  Jens Blauert,et al.  Spatial Hearing-Revised Edition : The Psychophysics of Human Sound Localization , 2017 .

[3]  Meriem Jaïdane,et al.  Audio watermarking: a way to stationnarize audio signals , 2005, IEEE Transactions on Signal Processing.

[4]  Michael Faulkner,et al.  Bandwidth Limitation for the Constant Envelope Components of an OFDM Signal in a LINC Architecture , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[5]  Poras T. Balsara,et al.  All Digital-Quadrature-Modulator Based Wideband Wireless Transmitters , 2009, IEEE Transactions on Circuits and Systems I: Regular Papers.

[6]  Durand R. Begault,et al.  3-D Sound for Virtual Reality and Multimedia Cambridge , 1994 .

[7]  Mahmut T. Kandemir,et al.  Estimating and reducing the memory requirements of signal processing codes for embedded systems , 2006, IEEE Transactions on Signal Processing.

[8]  Matthew Scarpino OpenCL in Action: How to Accelerate Graphics and Computations , 2011 .

[9]  Zhan Ma,et al.  On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding , 2011, IEEE Transactions on Multimedia.

[10]  Jose A. Belloch,et al.  Headphone-Based virtual spatialization of sound with a GPU accelerator , 2013 .

[11]  Enrique S. Quintana-Ortí,et al.  An Integrated Framework for Power-Performance Analysis of Parallel Scientific Workloads , 2013 .

[12]  K. J. Ray Liu,et al.  Low-Power Architectures for Compressed Domain Video Coding Co-Processor , 2000, IEEE Trans. Multim..

[13]  Emmanuel Ifeachor,et al.  Digital Signal Processing: A Practical Approach , 1993 .

[14]  Oscal T.-C. Chen,et al.  Low-Complexity Inverse Transforms of Video Codecs in an Embedded Programmable Platform , 2011, IEEE Transactions on Multimedia.

[15]  Larry S. Davis,et al.  Rendering localized spatial audio in a virtual auditory space , 2004, IEEE Transactions on Multimedia.

[16]  E. Welch,et al.  A study of the use of SIMD instructions for two image processing algorithms , 2012, 2012 Western New York Image Processing Workshop.

[17]  John V. McCanny,et al.  Application-specific instruction set processor for SoC implementation of modern signal processing algorithms , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[18]  F. Xia,et al.  Experiments with Odroid-XU3 board , 2015 .

[19]  Pascal Frossard,et al.  Markov Decision Process Based Energy-Efficient On-Line Scheduling for Slice-Parallel Video Decoders on Multicore Systems , 2013, IEEE Transactions on Multimedia.

[20]  T. Hughes,et al.  Signals and systems , 2006, Genome Biology.

[21]  Andreas Demosthenous,et al.  Prediction-Based Incremental Refinement for Binomially-Factorized Discrete Wavelet Transforms , 2010, IEEE Transactions on Signal Processing.

[22]  David Blaauw,et al.  A Low-Cost Audio Computer for Information Dissemination Among Illiterate People Groups , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[23]  Shoji Shimada,et al.  A study on switching of the transfer functions focusing on sound quality , 2005 .

[24]  Alexander D. Poularikas Signals and Systems Primer with MATLAB (Electrical Engineering & Applied Signal Processing Series) , 2006 .

[25]  Shaolei Ren,et al.  Dynamic Scheduling for Energy Minimization in Delay-Sensitive Stream Mining , 2014, IEEE Transactions on Signal Processing.

[26]  Ronald R. Coifman,et al.  Audio-Visual Group Recognition Using Diffusion Maps , 2010, IEEE Transactions on Signal Processing.

[27]  V. Ralph Algazi,et al.  Headphone-Based Spatial Sound , 2011, IEEE Signal Processing Magazine.

[28]  Sebastian Bentmar Holgersson Optimising IIR Filters Using ARM NEON , 2012 .

[29]  Enrique S. Quintana-Ortí,et al.  Vectorization of binaural sound virtualization on the ARM Cortex-A15 architecture , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[30]  Jun Zhou,et al.  Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel Platforms , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[31]  Enrique S. Quintana-Ortí,et al.  Accelerating multi-channel filtering of audio signal on ARM processors , 2016, The Journal of Supercomputing.