A real-time phoneme counting algorithm and application for speech rate monitoring.

Adults who stutter can learn to control and improve their speech fluency by modifying their speaking rate. Existing speech therapy technologies can assist this practice by monitoring speaking rate and providing feedback to the patient, but cannot provide an accurate, quantitative measurement of speaking rate. Moreover, most technologies are too complex and costly to be used for home practice. We developed an algorithm and a smartphone application that monitor a patient's speaking rate in real time and provide user-friendly feedback to both patient and therapist. Our speaking rate computation is performed by a phoneme counting algorithm which implements spectral transition measure extraction to estimate phoneme boundaries. The algorithm is implemented in real time in a mobile application that presents its results in a user-friendly interface. The application incorporates two modes: one provides the patient with visual feedback of his/her speech rate for self-practice and another provides the speech therapist with recordings, speech rate analysis and tools to manage the patient's practice. The algorithm's phoneme counting accuracy was validated on ten healthy subjects who read a paragraph at slow, normal and fast paces, and was compared to manual counting of speech experts. Test-retest and intra-counter reliability were assessed. Preliminary results indicate differences of -4% to 11% between automatic and human phoneme counting. Differences were largest for slow speech. The application can thus provide reliable, user-friendly, real-time feedback for speaking rate control practice.

[1]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[2]  Nivja H. Jong,et al.  Praat script to detect syllable nuclei and measure speech rate automatically , 2009, Behavior research methods.

[3]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[4]  Eric Fosler-Lussier,et al.  Combining multiple estimators of speaking rate , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  O. Amir,et al.  A longitudinal investigation of speaking rate in preschool children who stutter. , 1999, Journal of speech, language, and hearing research : JSLHR.

[6]  R. Ingham,et al.  Evaluation of a stuttering treatment based on reduction of short phonation intervals. , 2001, Journal of speech, language, and hearing research : JSLHR.

[7]  Ofer Amir,et al.  Listeners' attitude toward people with dysphonia. , 2013, Journal of voice : official journal of the Voice Foundation.

[8]  J. Kalinowski,et al.  Investigations of the impact of altered auditory feedback in-the-ear devices on the speech of people who stutter: initial fitting and 4-month follow-up. , 2004, International journal of language & communication disorders.

[9]  J. Kalinowski,et al.  Stuttering amelioration at various auditory feedback delays and speech rates. , 1996, European journal of disorders of communication : the journal of the College of Speech and Language Therapists, London.

[10]  Sunil Kumar Kopparapu,et al.  Real time speaking rate monitoring system , 2011, 2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC).

[11]  Eric Fosler-Lussier,et al.  Speech recognition using on-line estimation of speaking rate , 1997, EUROSPEECH.

[12]  S. Hargrave,et al.  Effect of frequency-altered feedback on stuttering frequency at normal and fast speech rates. , 1994, Journal of speech and hearing research.

[13]  Claudia Regina Furquim de Andrade,et al.  Relationship between the stuttering severity index and speech rate , 2003, Sao Paulo medical journal = Revista paulista de medicina.

[14]  Richard M. Stern,et al.  On the effects of speech rate in large vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[15]  J. Fleiss,et al.  Statistical methods for rates and proportions , 1973 .

[16]  Jérôme Farinas,et al.  Automatic estimation of speaking rate in multilingual spontaneous speech , 2004, Speech Prosody 2004.

[17]  S. S. Awad The application of digital speech processing to stuttering therapy , 1997, IEEE Instrumentation and Measurement Technology Conference Sensing, Processing, Networking. IMTC Proceedings.

[18]  W R Tiffany,et al.  The effects of syllable structure on diadochokinetic and reading rates. , 1980, Journal of speech and hearing research.

[19]  Lawrence R. Rabiner,et al.  On the Relation between Maximum Spectra Boundaries , 2006 .

[20]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[21]  Mohammad S. Obaidat,et al.  E-Business and Telecommunications , 2014, Communications in Computer and Information Science.

[22]  Shrikanth S. Narayanan,et al.  Robust Speech Rate Estimation for Spontaneous Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  F. Ramus Acoustic correlates of linguistic rhythm: Perspectives , 2002 .

[24]  Nicole Propst,et al.  Fluency And Stuttering , 2016 .

[25]  A. R. Mallard,et al.  Disfluencies and rate of speech in young adult nonstutterers , 1986 .

[26]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[27]  Joseph P. Campbell,et al.  Gender-dependent phonetic refraction for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Andrzej Czyzewski,et al.  Real-Time Speech-Rate Modification Experiments , 2010 .

[29]  David B. Grayden,et al.  Phonemic segmentation of fluent speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  V. Gracco,et al.  Effects of Alterations in Auditory Feedback and Speech Rate on Stuttering Frequency , 1993, Language and speech.

[31]  Alan V. Oppenheim,et al.  Discrete-time Signal Processing. Vol.2 , 2001 .

[32]  R. Müller,et al.  A critical discussion of intraclass correlation coefficients. , 1994, Statistics in medicine.

[33]  Hartmut R. Pfitzinger,et al.  Local speech rate as a combination of syllable and phone rate , 1998, ICSLP.

[34]  Suresh Manandhar,et al.  Phoneme segmentation of speech , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[35]  A. Giovanni,et al.  Vocal flexibility and prosodic strategies in a professional impersonator. , 2013, Journal of voice : official journal of the Voice Foundation.

[36]  D. Cicchetti Guidelines, Criteria, and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology. , 1994 .

[37]  D. Altman,et al.  STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT , 1986, The Lancet.

[38]  J. Kalinowski,et al.  Stuttering inhibition via visual feedback at normal and fast speech rates. , 2010, International journal of language & communication disorders.

[39]  I. Rothman Stuttering: theory and treatment. , 1969, Experimental medicine and surgery.

[40]  Douglas D. O'Shaughnessy,et al.  Robust gender-dependent acoustic-phonetic modelling in continuous speech recognition based on a new automatic male/female classification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[41]  Thilo Pfau,et al.  Estimating the speaking rate by vowel detection , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[42]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[43]  G. Maguire,et al.  Overview of the Diagnosis and Treatment of Stuttering , 2012 .

[44]  P. Vargha,et al.  A critical discussion of intraclass correlation coefficients. , 1997, Statistics in medicine.

[45]  L. Wolk,et al.  Phonological complexity in school-aged children who stutter and exhibit a language disorder. , 2015, Journal of fluency disorders.

[46]  E. Mysak,et al.  Disfluency and rate characteristics of young adult, middle-aged, and older males. , 1987, Journal of Communication Disorders.

[47]  Urmila Shrawankar,et al.  Techniques for Feature Extraction In Speech Recognition System : A Comparative Study , 2013, ArXiv.

[48]  Partha Niyogi,et al.  Robust acoustic-based syllable detection , 2006, INTERSPEECH.

[49]  Jean-Pierre Martens,et al.  A fast and reliable rate of speech detector , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[50]  E. Conture,et al.  Childhood stuttering and speech disfluencies in relation to children's mean length of utterance: a preliminary study. , 2003, Journal of fluency disorders.