Automatic classification of speech dysfluencies in continuous speech based on similarity measures and morphological image processing tools

Abstract Speech-language pathologists, traditionally, count the number of speech dysfluencies to measure the rate of stuttering severity. Subjective stuttering assessment is time consuming and highly dependent on clinician's experiences. The present study proposes an objective evaluation of speech dysfluencies (sounds prolongation, syllables\words\phrases repetition) in continuous speech signals. The proposed method is based on finding similarity in successive frames of speech features for sounds prolongation detection and in close segments of speech for repetition detection. Speech signals are initially parameterized to MFCC, PLP or filter bank energy feature sets. Then, similarity matrix is calculated based on similarities of all pairs of frames using cross-correlation or Euclidean criterion. Similarity matrix is considered as an image and highly similar components are extracted using proper threshold. By employing morphological image processing tools, irrelevant parts of similarity matrix are removed and dysfluent parts are detected. The effects of different feature sets and similarity measures on classification results were examined. The promising classification accuracy of 99.84%, 98.07% and 99.87% were achieved for detection of prolongation, syllable/word repetition and phrase repetition, respectively.

[1]  M. Wiśniewski,et al.  Automatic detection and classification of phoneme repetitions using HTK toolkit , 2011 .

[2]  E. Yairi,et al.  A longitudinal study of stuttering in children: a preliminary report. , 1992, Journal of speech and hearing research.

[3]  Wieslawa Kuniszyk-Józkowiak,et al.  Hierarchical ANN system for stuttering identification , 2013, Comput. Speech Lang..

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Andrzej Czyzewski,et al.  Intelligent Processing of Stuttered Speech , 2003, Journal of Intelligent Information Systems.

[6]  Sazali Yaacob,et al.  Classification of speech dysfluencies with MFCC and LPCC features , 2012, Expert Syst. Appl..

[7]  R. Curlee Observer agreement on disfluency and stuttering. , 1981, Journal of speech and hearing research.

[8]  M. A. Young,et al.  Observer agreement for marking moments of stuttering. , 1975, Journal of speech and hearing research.

[9]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[10]  Peter Howell,et al.  The UCLASS archive of stuttered speech , 2009 .

[11]  O. Bloodstein A handbook on stuttering , 1969 .

[12]  Sazali Yaacob,et al.  Comparison of speech parameterization techniques for the classification of speech disfluencies , 2013 .

[13]  Roman Cmejla,et al.  Evaluation of disfluent speech by means of automatic acoustic measurements. , 2014, The Journal of the Acoustical Society of America.

[14]  Pedro Gómez Vilda,et al.  Methodological issues in the development of automatic systems for voice pathology detection , 2006, Biomed. Signal Process. Control..

[15]  E. Yairi,et al.  Normative disfluency data for early childhood stuttering. , 1999, Journal of speech, language, and hearing research : JSLHR.

[16]  G. Riley A stuttering severity instrument for children and adults. , 1972, The Journal of speech and hearing disorders.

[17]  W. Johnson Measurements of oral reading and speaking rate and disfluency of adult male and female stutterers and nonstutterers. , 1961, The Journal of speech and hearing disorders.

[18]  J. Scott Yaruss,et al.  Clinical Measurement of Stuttering Behaviors , 1997 .

[19]  P Howell,et al.  Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: I. Psychometric procedures appropriate for selection of training material for lexical dysfluency classifiers. , 1997, Journal of speech, language, and hearing research : JSLHR.

[20]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[21]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[22]  Viktor K. Prasanna,et al.  Parallel Architectures and Algorithms for Image Component Labeling , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[24]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[25]  A comparison of speech envelopes of stutters and nonstutterers. , 1996, The Journal of the Acoustical Society of America.

[26]  Jérôme Farinas,et al.  Automatic estimation of speaking rate in multilingual spontaneous speech , 2004, Speech Prosody 2004.

[27]  Peter Howell,et al.  The University College London Archive of Stuttered Speech (UCLASS). , 2009, Journal of speech, language, and hearing research : JSLHR.

[28]  M. Hariharan,et al.  Automatic detection of prolongations and repetitions using LPCC , 2009, 2009 International Conference for Technical Postgraduates (TECHPOS).

[29]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[30]  Sazali Yaacob,et al.  Objective evaluation of speech dysfluencies using wavelet packet transform with sample entropy , 2013, Digit. Signal Process..

[31]  M RaviKumarK,et al.  Comparison of Multidimensional MFCC Feature Vectors for Objective Assessment of Stuttered Disfluencies , 2011 .

[32]  E. Conture,et al.  Disfluency clusters of children who stutter: relation of stutterings to self-repairs. , 1995, Journal of speech and hearing research.

[33]  P Howell,et al.  Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: II. ANN recognition of repetitions and prolongations with supplied word segment markers. , 1997, Journal of speech, language, and hearing research : JSLHR.

[34]  Jiri Pospichal,et al.  Pattern search in dysfluent speech , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[35]  Wiesława Kuniszyk-Jóźkowiak,et al.  Speech disfluency detection with the correlative method , 2005, Ann. UMCS Informatica.