论文信息 - Activating accessible pedestrian signals by voice using keyword spotting systems

Activating accessible pedestrian signals by voice using keyword spotting systems

We describe the problem of partially sighted pedestrians who identify crossing the street as a difficult task. Pedestrian call buttons remain difficult to activate for partially sighted pedestrians. We believe that human-technology interfaces such as speech-related technologies can help to make this task easier. We propose a deep learning model for keyword spotting system where pedestrians can activate the pedestrian call button using their voice and thus enable them to cross the streets safely.

Carlos Cruz Corona | David A. Pelta | Mirzodaler Muhsinzoda | Jose Luis Verdegay

[1] Francoise Beaufays,et al. “Your Word is my Command”: Google Search by Voice: A Case Study , 2010 .

[2] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[3] Li Deng,et al. Ensemble deep learning for speech recognition , 2014, INTERSPEECH.

[4] Richard F. Lyon,et al. Trainable frontend for robust and far-field keyword spotting , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] D. Geruschat,et al. Low Vision Mobility Problems: Perceptions of O&M Specialists and Persons with Low Vision , 1992 .

[6] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[7] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[8] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Vincent Lostanlen,et al. Per-Channel Energy Normalization: Why and How , 2019, IEEE Signal Processing Letters.

[10] M. Picheny,et al. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[11] Sercan Ömer Arik,et al. Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting , 2017, INTERSPEECH.

[12] Executive Summary. World Urbanization Prospects: The 2018 Revision , 2019 .

[13] Georg Heigold,et al. Small-footprint keyword spotting using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[16] E. LESTER SMITH,et al. AND OTHERS , 2005 .

[17] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[18] Janet M Barlow,et al. Blind Pedestrians and the Changing Technology and Geometry of Signalized Intersections: Safety, Orientation, and Independence , 2005, Journal of visual impairment & blindness.