Effect of Saliency-Based Masking in Scene Classification

In this paper, the effect of attention-based local feature selection to 15-class scene classification is investigated, as an extension of the previous researches showing the different effect of each spatial scale to its performance in the early stage of human vision processing. Visual saliency is used as a criterion for selecting the local regions from where HoG features are extracted. Experimental results show that such saliency-based masking significantly affects the classification performance: contrary to the previous reports in the field of object recognition, the low-salient regions contribute more than the high-salient regions in scene classification, and that is consistent with several previous reports of insisting the importance of spatial layout in the low frequency channel, which support the scene schema hypothesis. Also, the result implies that the highest salient regions, which occupies top 20 percent in saliency, hardly contribute to classification performance.