论文信息 - Mobile Device-Based Speech Enhancement System Using Lip-Reading

Mobile Device-Based Speech Enhancement System Using Lip-Reading

This paper describes our preliminary study towards a new type of speech enhancement system. To avoid using odd-looking electrolarynx, we used lip-reading function. Our final image is to use a smart phone with camera and audio output to be able to convert the lip motion to speech output. We tested MLP, CNN, and MobileNets image recognition methods. 3k image datasets for training and testing were recorded from five persons. The preliminary experiment indicated that the MobileNets is the most adequate algorithm for smart phone apps. in terms of the recognition accuracy and the calculation cost.

[1] J. M. Gilbert,et al. Silent speech interfaces , 2010, Speech Commun..

[2] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[3] Pattie Maes,et al. AlterEgo: A Personalized Wearable Silent Speech Interface , 2018, IUI.

[4] Hideki Kawahara,et al. STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds , 2006 .

[5] Josephine Sullivan,et al. One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Davis E. King. Max-Margin Object Detection , 2015, ArXiv.

[7] Takeshi Saitoh,et al. SSSD: Speech Scene database by Smart Device for Visual Speech Recognition , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[8] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[9] Saitoh Takeshi,et al. SSSD: Japanese Speech Scene Database by Smart Device for Visual Speech Recognition , 2018 .

[10] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.