论文信息 - TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication

TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication

Despite promising initial studies, a speaker's original voice can cause problems when it comes to the application of real-time voice conversion (data-driven speaker conversion) technology in our daily lives, specifically in our near-field communication, because the overlapping speech degrades the sense of immersion to the converted speech. We present TransVoice, a real-time voice conversion system that physically confines original speech with a mask-shaped device. Our preliminary study shows the proposed device can reduce the volume of original speech significantly, while it ameliorates the deteriorated conversion quality of the deep neural network (DNN) thanks to an integrated filter that weakens the low frequency range. We discuss novel applications using TransVoice that can augment our communication.

[1] Ji‐Zhao Liang,et al. Soundproofing effect of polypropylene/inorganic particle composites , 2012 .

[2] Tomohiro Nakatani,et al. Single Channel Target Speaker Extraction and Recognition with Speaker Beam , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] R. Chad,et al. Effect of virtual reality headset for pediatric fear and pain distraction during immunization. , 2018, Pain management.

[4] Yannis Stylianou,et al. Voice Transformation: A survey , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5] Shinnosuke Takamichi,et al. Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).

[6] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Pattie Maes,et al. AlterEgo: A Personalized Wearable Silent Speech Interface , 2018, IUI.

[8] Jun Rekimoto,et al. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks , 2019, CHI.