Multiple Object Scene Description for the Visually Impaired Using Pre-trained Convolutional Neural Networks

This paper introduces a new method for multiple object scene description as part of a system to guide the visually impaired in an indoor environment. Here we are interested in a coarse scene description, where only the presence of certain objects is indicated regardless of its position in the scene. The proposed method is based on the extraction of powerful features using pre-trained convolutional neural networks (CNN), then training a Neural Network regression to predict the content of any unknown scene based on its CNN feature. We have found the CNN feature to be highly descriptive, even though it is trained on auxiliary data from a completely different domain.

[1]  Luc Van Gool,et al.  Scalable multi-class object detection , 2011, CVPR 2011.

[2]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[3]  Naif Alajlan,et al.  Toward an assisted indoor scene perception for blind people with image multilabeling strategies , 2015, Expert Syst. Appl..

[4]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Naif Alajlan,et al.  A Compressive Sensing Approach to Describe Indoor Scenes for Blind People , 2015, IEEE Transactions on Circuits and Systems for Video Technology.