Direct memory access usage optimization in network applications for reduced memory latency and energy consumption

Today, wireless networks are becoming increasingly ubiquitous. Usually several complex multi-threaded applications are mapped on a single embedded system and each of them is triggered by a different input stream (in accordance with the run-time behaviours of the user and the environment). This dynamicity renders the task of fully analyzing at design-time these systems very complex, if not impossible. Therefore, run-time information has to be used in order to produce an efficient design. This introduces new challenges, especially for embedded system designers using a Direct Memory Access (DMA) module, who have to know in advance the memory transfer behaviour of the whole system, in order to design and program their DMA efficiently. This is especially important in embedded systems with DRAM memories as the concurrent accesses from different processing elements can adversely affect the page-based architecture of these memory elements. Even more, the increasingly common usage of dynamic data types further complicates the problem because the exact location of data instances in the memory is unknown at design-time. In this paper we propose a system-level optimization methodology to adapt the DMA usage parameters automatically at run-time, according to online information. With our proposed optimization approach we manage to reduce the mean latency of the memory transfers by more than 18%, thus reducing the average number of cycles that processing elements or DMAs have to waste waiting for data from the main memory, while optimizing energy consumption and system responsiveness. We evaluate our approach using a set of real-life applications and real wireless dynamic streams.

[1]  Yongmin Kim,et al.  Data Cache and Direct Memory Access in Programming Mediaprocessors , 2001, IEEE Micro.

[2]  B. Goode,et al.  Voice over Internet protocol (VoIP) , 2002, Proc. IEEE.

[3]  Luca Benini,et al.  Integrated task scheduling and data assignment for SDRAMs in dynamic applications , 2004, IEEE Design & Test of Computers.

[4]  Erik Brockmeyer,et al.  A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Francky Catthoor,et al.  Template-Based Semi-Automatic Profiling of Multimedia Applications , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[6]  Sumesh Udayakumaran,et al.  Compiler-decided dynamic memory allocation for scratch-pad based embedded systems , 2003, CASES '03.

[7]  Henk Corporaal,et al.  Intra-task scenario-aware voltage scheduling , 2005, CASES '05.

[8]  Kai Li,et al.  Protected, user-level DMA for the SHRIMP network interface , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[9]  Lieven Eeckhout,et al.  Quantifying the Impact of Input Data Sets on Program Behavior and its Applications , 2003, J. Instr. Level Parallelism.

[10]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[11]  Sanjive Agarwala,et al.  A scalable high-performance DMA architecture for DSP applications , 2000, Proceedings 2000 International Conference on Computer Design.

[12]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[13]  Yuanyuan Zhou,et al.  DMA-aware memory energy management , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[14]  Luca Benini,et al.  Scenario-based SDRAM-Energy-Aware Scheduling for Dynamic Multi-Media Applications on Multi-Processor Platforms. , 2002 .

[15]  Axel Jantsch,et al.  Synthesis of DMA controllers from architecture independent descriptions of HW/SW communication protocols , 1999, Proceedings Twelfth International Conference on VLSI Design. (Cat. No.PR00013).

[16]  Peter Marwedel,et al.  Cache-aware scratchpad allocation algorithm , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[17]  Paul R. Wilson,et al.  Dynamic Storage Allocation: A Survey and Critical Review , 1995, IWMM.

[18]  Tristan Henderson,et al.  The changing usage of a mature campus-wide wireless network , 2004, MobiCom '04.

[19]  Bradford Nichols,et al.  Pthreads programming , 1996 .

[20]  Luca Benini,et al.  An integrated hardware/software approach for run-time scratchpad management , 2004, Proceedings. 41st Design Automation Conference, 2004..

[21]  Mahmut T. Kandemir,et al.  Compiler-directed scratch pad memory hierarchy design and management , 2002, DAC '02.