Efficient and low-latency systolic array architecture for full searches in block-matching motion estimation

This paper describes an efficient, low latency systolic array architecture for full searches in block matching motion estimation. Conventional one dimensional systolic array architecture is used to develop a novel ring like systolic array architecture through operator rescheduling considering the symmetry of the data flow. High latency delay due to stuffing of the array pipeline in the conventional architecture was eliminated. The new architecture delivers a higher throughput rate, achieves higher processor utilization, and has low power consumption. In addition, the minimum memory bandwidth of the conventional architecture is preserved. 