DEEP-SEE: Joint Object Detection, Tracking and Recognition with Application to Visually Impaired Navigational Assistance

In this paper, we introduce the so-called DEEP-SEE framework that jointly exploits computer vision algorithms and deep convolutional neural networks (CNNs) to detect, track and recognize in real time objects encountered during navigation in the outdoor environment. A first feature concerns an object detection technique designed to localize both static and dynamic objects without any a priori knowledge about their position, type or shape. The methodological core of the proposed approach relies on a novel object tracking method based on two convolutional neural networks trained offline. The key principle consists of alternating between tracking using motion information and predicting the object location in time based on visual similarity. The validation of the tracking technique is performed on standard benchmark VOT datasets, and shows that the proposed approach returns state-of-the-art results while minimizing the computational complexity. Then, the DEEP-SEE framework is integrated into a novel assistive device, designed to improve cognition of VI people and to increase their safety when navigating in crowded urban scenes. The validation of our assistive device is performed on a video dataset with 30 elements acquired with the help of VI users. The proposed system shows high accuracy (>90%) and robustness (>90%) scores regardless on the scene dynamics.

[1]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Guido Bologna,et al.  Obstacle and planar object detection using sparse 3D information for a smart walker , 2014, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[3]  Luis Miguel Bergasa,et al.  Assisting the Visually Impaired: Obstacle Detection and Warning System by Acoustic Feedback , 2012, Sensors.

[4]  Tony P. Pridmore,et al.  TRIC-track: Tracking by Regression with Incrementally Learned Cascades , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  A. Aydin Alatan,et al.  Spatial windowing for correlation filter based visual tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[6]  Ruxandra Tapu,et al.  When Ultrasonic Sensors and Computer Vision Join Forces for Efficient Obstacle Detection and Recognition , 2016, Sensors.

[7]  Roberto Manduchi,et al.  Mobile Vision as Assistive Technology for the Blind: An Experimental Study , 2012, ICCHP.

[8]  Andrei Bursuc,et al.  A Smartphone-Based Obstacle Detection and Classification System for Assisting Visually Impaired People , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[9]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[10]  Ilenia Tinnirello,et al.  Enhancing tracking performance in a smartphone-based navigation system for visually impaired people , 2016, 2016 24th Mediterranean Conference on Control and Automation (MED).

[11]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Gye-Young Kim,et al.  Robust Estimation of Camera Homography Using Fuzzy RANSAC , 2007, ICCSA.

[13]  David Zhang,et al.  Fast Visual Tracking via Dense Spatio-temporal Context Learning , 2014, ECCV.

[14]  Jiri Matas,et al.  Robust scale-adaptive mean-shift for tracking , 2013, Pattern Recognit. Lett..

[15]  Ales Leonardis,et al.  Visual Object Tracking Performance Measures Revisited , 2015, IEEE Transactions on Image Processing.

[16]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jörg Conradt,et al.  A mobility device for the blind with improved vertical resolution using dynamic vision sensors , 2016, 2016 IEEE 18th International Conference on e-Health Networking, Applications and Services (Healthcom).

[18]  Bohyung Han,et al.  Modeling and Propagating CNNs in a Tree Structure for Visual Tracking , 2016, ArXiv.

[19]  Bing Li,et al.  ISANA: Wearable Context-Aware Indoor Assistive Navigation with Obstacle Avoidance for the Blind , 2016, ECCV Workshops.

[20]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[21]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiaogang Wang,et al.  STCT: Sequentially Training Convolutional Networks for Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Thomas Mauthner,et al.  In defense of color-based model-free tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Anderson Rocha,et al.  A Kinect-Based Wearable Face Recognition System to Aid Visually Impaired Users , 2017, IEEE Transactions on Human-Machine Systems.

[25]  Khaled M. Elleithy,et al.  Sensor-Based Assistive Devices for Visually-Impaired People: Current Status, Challenges, and Future Directions , 2017, Sensors.

[26]  Kai-Kuang Ma,et al.  Adaptive rood pattern search for fast block-matching motion estimation , 2002, IEEE Trans. Image Process..

[27]  Gérard G. Medioni,et al.  Robot vision for the visually impaired , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  Ruxandra Tapu,et al.  A survey on wearable devices used to assist the visual impaired user navigation in outdoor environments , 2014, 2014 11th International Symposium on Electronics and Telecommunications (ISETC).

[30]  Ales Leonardis,et al.  Single target tracking using adaptive clustered decision trees and dynamic multi-level appearance models , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  João Barroso,et al.  The SmartVision Navigation Prototype for Blind Users , 2011 .

[32]  Wolfgang Hübner,et al.  MAD for visual tracker fusion , 2016, Security + Defence.

[33]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.