A novel modular systolic array architecture for full-search block matching motion estimation

Proposes a modular systolic array architecture for the full-search block matching motion estimation algorithm (FBMA). With this novel architecture, the authors are able to generate a motion vector for every reference block in raster scan order while achieving 100% processor utilization and high throughput rate. Furthermore, they devised a scheme to save the pin count (I/O) by sharing memory units. This results in low memory bandwidth. This architecture is scalable in that it can easily be adapted to handle larger search ranges and different block sizes without increasing the effective latency.