Real-World Scene Categorisation by a Self-Organising Neural Network

There is now a great deal of evidence that the visual system identifies the category of a scene before identifying its component objects. Schyns and Oliva (1994 Psychological Science 5 195 – 200) showed that scene recognition could be initiated with only coarse blobs (information from which the identity of objects could not be recognised). Thus, one possible strategy for object recognition would be first to recognise the background scene from very coarse information and then recognise the component objects from fine-scale information. However, this will only be useful to the extent that there is enough coarse-scale information for the background scene to be recognised. We present a scene categorisation model in which low and very low spatial frequencies alone offer sufficient information to produce consistent clusters of five distinct scene categories (beach, city, forest, room, and valley). A new self-organising neural network—curvilinear component analysis (P Demartines, J Herault, 1997 IEEE Transactions on Neural Networks 8 149 – 154)—was used to project nonlinearly a filtered version of a real-world image onto a two-dimensional output space. The resulting projection preserved the semantic proximities between the scene categories. These results offer formal evidence that there is enough coarse-scale information for recognition.