Assessment of fused videos using scanpaths: a comparison of data analysis methods.

The increased interest in image fusion (combining images of two or more modalities such as infrared and visible light radiation) has led to a need for accurate and reliable image assessment methods. Previous work has often relied upon subjective quality ratings combined with some form of computational metric analysis. However, we have shown in previous work that such methods do not correlate well with how people perform in actual tasks utilising fused images. The current study presents the novel use of an eye-tracking paradigm to record how accurately participants could track an individual in various fused video displays. Participants were asked to track a man in camouflage outfit in various input videos (visible and infrared originals, a fused average of the inputs; and two different wavelet-based fused videos) whilst also carrying out a secondary button-press task. The results were analysed in two ways, once calculating accuracy across the whole video, and by dividing the video into three time sections based on video content. Although the pattern of results depends on the analysis, the accuracy for the inputs was generally found to be significantly worse than that for the fused displays. In conclusion, both approaches have good potential as new fused video assessment methods, depending on what task is carried out.