A hardware cache memcpy accelerator

In this paper, we present a hardware solution to perform the commonly used memcpy operation with the goal to reduce the time to perform the actual memory copies. This is accomplished by taking advantage of the presence of a cache that is found next to many current-day (embedded) processors. Additionally, the currently presented solution assumes that to be copied data is already in the cache and is aligned by the cache-line size. We present the concept and implementation details of the proposed hardware module and the system used to experiment both our hardware and an optimized software implementation of the memcpy function. Experimental results show that the proposed hardware solution is at least 79% faster than an optimized hand-coded software solution