Rethinking image registration on customizable hardware

Image registration is one of the most important tasks in image processing and is frequently one of the most computationally intensive. In cases where there is a high likelihood of finding the exact template in the search image, correlation-based methods predominate. Presumably this is because the computational complexity of a correlation operation can be reduced substantially by transforming the task into the frequency domain. Alternative methods such as minimum Sum of Squared Differences (minSSD) are not so tractable and are normally disfavored. This bias is justified when dealing with conventional computer processors since the operations must be conducted in an essentially sequential manner however we demonstrate it is normally unjustified when the processing is undertaken on customizable hardware such as FPGAs where tasks can be temporally and/or spatially parallelized. This is because the gate-based logic of an FPGA is better suited to the tasks of minSSD i.e. signed-addition hardware can be very cheaply implemented in FPGA fabric, and square operations are easily implemented via a look-up table. In contrast, correlationbased methods require extensive use of multiplier hardware which cannot be so cheaply implemented in the device. Even with modern DSP-oriented FPGAs which contain many "hard" multipliers we experience at least an order of magnitude increase in the number of minSSD hardware modules we can implement compared to cross-correlation modules. We demonstrate successful use and comparison of techniques within an FPGA for registration and correction of turbulence degraded images.