论文信息 - Real-Time Indoor Scene Description for the Visually Impaired Using Autoencoder Fusion Strategies with Visible Cameras

Real-Time Indoor Scene Description for the Visually Impaired Using Autoencoder Fusion Strategies with Visible Cameras

This paper describes three coarse image description strategies, which are meant to promote a rough perception of surrounding objects for visually impaired individuals, with application to indoor spaces. The described algorithms operate on images (grabbed by the user, by means of a chest-mounted camera), and provide in output a list of objects that likely exist in his context across the indoor scene. In this regard, first, different colour, texture, and shape-based feature extractors are generated, followed by a feature learning step by means of AutoEncoder (AE) models. Second, the produced features are fused and fed into a multilabel classifier in order to list the potential objects. The conducted experiments point out that fusing a set of AE-learned features scores higher classification rates with respect to using the features individually. Furthermore, with respect to reference works, our method: (i) yields higher classification accuracies, and (ii) runs (at least four times) faster, which enables a potential full real-time application.

[1] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[2] Shraga Shoval,et al. Auditory guidance with the Navbelt-a computerized travel aid for the blind , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[3] Iwan Ulrich,et al. The GuideCane-applying mobile robot technologies to assist the visually impaired , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[4] Matti Pietikäinen,et al. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5] Sazali Yaacob,et al. Stereopsis method for visually impaired to identify obstacles based on distance , 2004, Third International Conference on Image and Graphics (ICIG'04).

[6] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7] Xiaodong Yang,et al. Robust door detection in unfamiliar environments by combining edge and corner features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[8] Rong Jin,et al. Understanding bag-of-words model: a statistical framework , 2010, Int. J. Mach. Learn. Cybern..

[9] Zhenhua Guo,et al. A Completed Modeling of Local Binary Pattern Operator for Texture Classification , 2010, IEEE Transactions on Image Processing.

[10] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[11] Maamar Bettayeb,et al. A Navigation Aid for Blind People , 2011, J. Intell. Robotic Syst..

[12] Diego López-de-Ipiña,et al. BlindShopping: Enabling Accessible Shopping for Visually Impaired People through Mobile Technologies , 2011, ICOST.

[13] Xiaodong Yang,et al. Robust and Effective Component-Based Banknote Recognition for the Blind , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14] Lorenzo Scalise,et al. Experimental Investigation of Electromagnetic Obstacle Detection for Visually Impaired Users: A Comparison With Ultrasonic Sensing , 2012, IEEE Transactions on Instrumentation and Measurement.

[15] Shuihua Wang,et al. Camera-Based Signage Detection and Recognition for Blind Persons , 2012, ICCHP.

[16] Wai Ho Li,et al. Plane-based detection of staircases using inverse depth , 2012, ICRA 2012.

[17] Shrinivas J. Pundlik,et al. Collision Detection for Visually Impaired from a Body-Mounted Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[18] Yingli Tian,et al. A primary travelling assistant system of bus detection and recognition for visually impaired people , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[19] Farid Melgani,et al. Detecting Cars in UAV Images With a Catalog-Based Approach , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[20] Naif Alajlan,et al. A Compressive Sensing Approach to Describe Indoor Scenes for Blind People , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[21] Naif Alajlan,et al. Toward an assisted indoor scene perception for blind people with image multilabeling strategies , 2015, Expert Syst. Appl..

[22] Xin Yu,et al. Object Tracking With Multi-View Support Vector Machines , 2015, IEEE Transactions on Multimedia.

[23] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Ruxandra Tapu,et al. When Ultrasonic Sensors and Computer Vision Join Forces for Efficient Obstacle Detection and Recognition , 2016, Sensors.

[27] Farid Melgani,et al. A Deep Learning Approach to UAV Image Multilabeling , 2017, IEEE Geoscience and Remote Sensing Letters.