Improving the Noise Robustness of Prominence Detection for Children's Oral Reading Assessment

Reading skill is a critical component of basic literacy. We aim to develop an automated system to assess oral reading skills of primary school children (learning English as a second language) that could eventually be valuable in the scenario of teacher shortage typical of rural areas in the country. This work focuses on the rating of prosody, an important aspect of fluency in speech delivery. In particular, a system for the detection of word prominence based on prosodic features is presented and tested on real-world data marked by background noise typical of the school setting. To counteract the observed drop in prominence classification accuracy, two distinct approaches to noisy speech enhancement are evaluated for various types of background noise. A recently proposed Generative Adversarial Network(GAN) based method is found to be effective in achieving noise suppression with low levels of speech distortion that minimally impact prosodic feature extraction. The implementation and training of the GAN system is discussed and insights are provided on its performance with reference to that of classical spectral subtraction based enhancement.

[1]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[2]  Okko Johannes Räsänen,et al.  Automatic detection of sentence prominence in speech using predictability of word-level acoustic features , 2015, INTERSPEECH.

[3]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[4]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[5]  Gerhard Doblinger,et al.  Computationally efficient speech enhancement by spectral minima tracking in subbands , 1995, EUROSPEECH.

[6]  P. Rao,et al.  Automatic Assessment of Reading with Speech Recognition Technology , 2016 .

[7]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Quoc V. Le,et al.  Recurrent Neural Networks for Noise Reduction in Robust ASR , 2012, INTERSPEECH.

[9]  Taniya Mishra,et al.  Word Prominence Detection using Robust yet Simple Prosodic Features , 2012, INTERSPEECH.

[10]  Björn W. Schuller,et al.  Affect recognition in real-life acoustic conditions - a new perspective on feature selection , 2013, INTERSPEECH.

[11]  Shrikanth S. Narayanan,et al.  An Acoustic Measure for Word Prominence in Spontaneous Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Antonio Bonafonte,et al.  SEGAN: Speech Enhancement Generative Adversarial Network , 2017, INTERSPEECH.

[13]  Andrew Rosenberg,et al.  Cross-Language Prominence Detection , 2012 .

[14]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[15]  George Christodoulides,et al.  An evaluation of machine learning methods for prominence detection in French , 2014, INTERSPEECH.

[16]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[17]  Dinei A. F. Florêncio,et al.  Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks , 2016, INTERSPEECH.

[18]  Mary Beth Beckman,et al.  Tagging prosody and discourse structure in elicited spontaneous speech , 2000 .

[19]  Fabio Tamburini,et al.  Prosodic prominence detection in speech , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[20]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[21]  David Malah,et al.  Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[22]  Kuldip K. Paliwal,et al.  The importance of phase in speech enhancement , 2011, Speech Commun..

[23]  Steven A. Stahl,et al.  Becoming a Fluent Reader: Reading Skill and Prosodic Features in the Oral Reading of Young Readers. , 2004, Journal of educational psychology.