Activating accessible pedestrian signals by voice using keyword spotting systems

We describe the problem of partially sighted pedestrians who identify crossing the street as a difficult task. Pedestrian call buttons remain difficult to activate for partially sighted pedestrians. We believe that human-technology interfaces such as speech-related technologies can help to make this task easier. We propose a deep learning model for keyword spotting system where pedestrians can activate the pedestrian call button using their voice and thus enable them to cross the streets safely.

[1]  Francoise Beaufays,et al.  “Your Word is my Command”: Google Search by Voice: A Case Study , 2010 .

[2]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[3]  Li Deng,et al.  Ensemble deep learning for speech recognition , 2014, INTERSPEECH.

[4]  Richard F. Lyon,et al.  Trainable frontend for robust and far-field keyword spotting , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  D. Geruschat,et al.  Low Vision Mobility Problems: Perceptions of O&M Specialists and Persons with Low Vision , 1992 .

[6]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[7]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[8]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Vincent Lostanlen,et al.  Per-Channel Energy Normalization: Why and How , 2019, IEEE Signal Processing Letters.

[10]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[11]  Sercan Ömer Arik,et al.  Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting , 2017, INTERSPEECH.

[12]  Executive Summary World Urbanization Prospects: The 2018 Revision , 2019 .

[13]  Georg Heigold,et al.  Small-footprint keyword spotting using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[16]  E. LESTER SMITH,et al.  AND OTHERS , 2005 .

[17]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[18]  Janet M Barlow,et al.  Blind Pedestrians and the Changing Technology and Geometry of Signalized Intersections: Safety, Orientation, and Independence , 2005, Journal of visual impairment & blindness.