In this paper we consider the interaction between different semantic levels in still image scene classification and object detection problems. We present a method where a neural method is used to produce a tentative higher-level semantic scene representation from low-level statistical visual features in a bottom-up fashion. This emergent representation is then used to refine the lower-level object detection results. We evaluate the proposed method with data from Pascal VOC Challenge 2006 image classification and object detection competition. The proposed techniques for exploiting global classification results are found to significantly improve the accuracy of local object detection.
[1]
Pasi Koikkalainen,et al.
Progress with the Tree-Structured Self-Organizing Map
,
1994,
ECAI.
[2]
Teuvo Kohonen,et al.
Self-Organizing Maps
,
2010
.
[3]
Alfred Ultsch,et al.
Data Mining and Knowledge Discovery with Emergent Self-Organizing Feature Maps for Multivariate Time Series
,
1999
.
[4]
Erkki Oja,et al.
PicSOM-self-organizing image retrieval with MPEG-7 content descriptors
,
2002,
IEEE Trans. Neural Networks.
[5]
Jorma Laaksonen,et al.
PicSOM Experiments in TRECVID 2018
,
2015,
TRECVID.