Noise-robust Prediction of Pronunciation Distances Aiming at Clustering of World Englishes Using a Learner's Self-centered Viewpoint

In recent years,we have more and more international tourists and in 2020, we have Tokyo Olympic Games. For communicating with those tourists, the default language is English but they speak English with various accents. To realize smooth communication with these tourists, we are developing a technical infrastructure to accustom Japanese people to variously accented Englishes (World Englishes). The infrastructure aims at clustering a large diversity of English pronunciations on an individual basis and visualizing the diversity in an educationally effective way. For clustering, a technique is needed that can predict the accent gap between any speaker pair and we developed it by integrating pronunciation structure analysis and support vector regression. In this paper, the prediction performance is evaluated when the prediction technique is applied for visualization using a user’s self– centered viewpoint and when it is applied with a noise suppression technique. Results show that the performance is comparable to that observed when we use phonemic, not phonetic, transcripts and that 10 [dB] is enough as SNR to guarantee the prediction performance realized in a clean condition.