Real-Time Indoor Scene Description for the Visually Impaired Using Autoencoder Fusion Strategies with Visible Cameras

This paper describes three coarse image description strategies, which are meant to promote a rough perception of surrounding objects for visually impaired individuals, with application to indoor spaces. The described algorithms operate on images (grabbed by the user, by means of a chest-mounted camera), and provide in output a list of objects that likely exist in his context across the indoor scene. In this regard, first, different colour, texture, and shape-based feature extractors are generated, followed by a feature learning step by means of AutoEncoder (AE) models. Second, the produced features are fused and fed into a multilabel classifier in order to list the potential objects. The conducted experiments point out that fusing a set of AE-learned features scores higher classification rates with respect to using the features individually. Furthermore, with respect to reference works, our method: (i) yields higher classification accuracies, and (ii) runs (at least four times) faster, which enables a potential full real-time application.

[1]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[2]  Shraga Shoval,et al.  Auditory guidance with the Navbelt-a computerized travel aid for the blind , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[3]  Iwan Ulrich,et al.  The GuideCane-applying mobile robot technologies to assist the visually impaired , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[4]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Sazali Yaacob,et al.  Stereopsis method for visually impaired to identify obstacles based on distance , 2004, Third International Conference on Image and Graphics (ICIG'04).

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Xiaodong Yang,et al.  Robust door detection in unfamiliar environments by combining edge and corner features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[8]  Rong Jin,et al.  Understanding bag-of-words model: a statistical framework , 2010, Int. J. Mach. Learn. Cybern..

[9]  Zhenhua Guo,et al.  A Completed Modeling of Local Binary Pattern Operator for Texture Classification , 2010, IEEE Transactions on Image Processing.

[10]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[11]  Maamar Bettayeb,et al.  A Navigation Aid for Blind People , 2011, J. Intell. Robotic Syst..

[12]  Diego López-de-Ipiña,et al.  BlindShopping: Enabling Accessible Shopping for Visually Impaired People through Mobile Technologies , 2011, ICOST.

[13]  Xiaodong Yang,et al.  Robust and Effective Component-Based Banknote Recognition for the Blind , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Lorenzo Scalise,et al.  Experimental Investigation of Electromagnetic Obstacle Detection for Visually Impaired Users: A Comparison With Ultrasonic Sensing , 2012, IEEE Transactions on Instrumentation and Measurement.

[15]  Shuihua Wang,et al.  Camera-Based Signage Detection and Recognition for Blind Persons , 2012, ICCHP.

[16]  Wai Ho Li,et al.  Plane-based detection of staircases using inverse depth , 2012, ICRA 2012.

[17]  Shrinivas J. Pundlik,et al.  Collision Detection for Visually Impaired from a Body-Mounted Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Yingli Tian,et al.  A primary travelling assistant system of bus detection and recognition for visually impaired people , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[19]  Farid Melgani,et al.  Detecting Cars in UAV Images With a Catalog-Based Approach , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Naif Alajlan,et al.  A Compressive Sensing Approach to Describe Indoor Scenes for Blind People , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Naif Alajlan,et al.  Toward an assisted indoor scene perception for blind people with image multilabeling strategies , 2015, Expert Syst. Appl..

[22]  Xin Yu,et al.  Object Tracking With Multi-View Support Vector Machines , 2015, IEEE Transactions on Multimedia.

[23]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ruxandra Tapu,et al.  When Ultrasonic Sensors and Computer Vision Join Forces for Efficient Obstacle Detection and Recognition , 2016, Sensors.

[27]  Farid Melgani,et al.  A Deep Learning Approach to UAV Image Multilabeling , 2017, IEEE Geoscience and Remote Sensing Letters.