Automatic initial and final segmentation in cleft palate speech of Mandarin speakers

The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with “quasi-unvoiced” or with “quasi-voiced” initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%.

[1]  Nan Li,et al.  A improved dual-threshold speech endpoint detection algorithm , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[2]  Feng Qiao,et al.  A Speech Endpoint Detection Algorithm Based on Entropy and RBF Neural Network , 2007 .

[3]  Otilia Kocsis,et al.  Context-adaptive pre-processing scheme for robust speech recognition in fast-varying noise environment , 2011, Signal Process..

[4]  T. Nagarajan,et al.  Selective pole modification-based technique for the analysis and detection of hypernasality , 2009, TENCON 2009 - 2009 IEEE Region 10 Conference.

[5]  Chulhee Lee,et al.  A Noninvasive Estimation of Hypernasality Using a Linear Predictive Model , 2004, Annals of Biomedical Engineering.

[6]  Justin Zhan,et al.  Measuring Topological Anonymity in Social Networks , 2007 .

[7]  Shrikanth S. Narayanan,et al.  Refined speech segmentation for concatenative speech synthesis , 2002, INTERSPEECH.

[8]  E. Dowling,et al.  A HMM-based approach for segmenting continuous speech , 1992, [1992] Conference Record of the Twenty-Sixth Asilomar Conference on Signals, Systems & Computers.

[9]  J.H.L. Hansen,et al.  A noninvasive technique for detecting hypernasal speech using a nonlinear operator , 1996, IEEE Transactions on Biomedical Engineering.

[10]  Maurizio Omologo,et al.  Automatic segmentation and labeling of speech based on Hidden Markov Models , 1993, Speech Commun..

[11]  Katy Hufnagle,et al.  Therapy techniques for cleft palate speech and related disorders. , 2004, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[12]  Seiichi Nakagawa,et al.  A method for continuous speech segmentation using HMM , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[13]  L. Shastri,et al.  SYLLABLE DETECTION AND SEGMENTATION USING TEMPORAL FLOW NEURAL NETWORKS , 1999 .

[14]  A. Noetzel Robust Syllable Segmentation Of Continuous Speech Using Neural Networks , 1991, Electro International, 1991.

[15]  Moataz M. H. El Ayadi,et al.  Text-independent speaker identification using robust statistics estimation , 2017, Speech Commun..

[16]  Yao Yuehua,et al.  Rough set attribute reduction algorithm based on discrete differential evolution , 2011 .

[17]  Doroteo Torre Toledano Neural network boundary refining for automatic speech segmentation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Li Jin,et al.  An Improved Speech Endpoint Detection Based on Spectral Subtraction and Adaptive Sub-band Spectral Entropy , 2010, 2010 International Conference on Intelligent Computation Technology and Automation.

[19]  Xufang Zhao,et al.  A new hybrid approach for automatic speech signal segmentation using silence signal detection, energy convex hull, and spectral variation , 2008, 2008 Canadian Conference on Electrical and Computer Engineering.

[20]  Ling He,et al.  A robust speech endpoint detection algorithm based on wavelet packet and energy entropy , 2013, Proceedings of 2013 3rd International Conference on Computer Science and Network Technology.

[21]  Li Zhan-ming I/F Segmentation for Mandarin Speech Based on Fuzzy-rough Neural Network , 2008 .

[22]  Li Hao,et al.  Initial/final segmentation using loss function and acoustic features , 2012 .

[23]  Qu Dan Boundary detection of Chinese initials and finals based on seneff's auditory spectrum features , 2012 .

[24]  M. Ramasubba Reddy,et al.  Acoustic Analysis and Detection of Hypernasality Using a Group Delay Function , 2007, IEEE Transactions on Biomedical Engineering.

[25]  Xu Doling Entropy-based initial/final segmentation for Chinese whiskered speech , 2005 .

[26]  Jian Li,et al.  Automatic segmentation of Chinese Mandarin speech into syllable-like , 2015, 2015 International Conference on Asian Language Processing (IALP).

[27]  Jenn-Yeu Chen,et al.  The syllable as the proximate unit in Mandarin Chinese word production: An intrinsic or accidental property of the production system? , 2013, Psychonomic bulletin & review.

[28]  Mu-Chun Su,et al.  A Segmentation Method for Continuous Speech Utilizing Hybrid Neuro-Fuzzy Network , 1999, J. Inf. Sci. Eng..

[29]  Hana Třísková The Structure of the Mandarin Syllable: Why, When and How to Teach it , 2011 .

[30]  Jorge Ivan Marin-Hurtado,et al.  Pattern recognition of hypernasality in voice of patients with Cleft and Lip Palate , 2014, 2014 XIX Symposium on Image, Signal Processing and Artificial Vision.

[31]  A. Harding,et al.  Characteristics of cleft palate speech. , 1996, European journal of disorders of communication : the journal of the College of Speech and Language Therapists, London.

[32]  Wang Zuo-ying A new logarithmic energy feature for endpoint detection , 2004 .

[33]  Ji-Won Cho,et al.  Independent vector analysis followed by HMM-based feature enhancement for robust speech recognition , 2016, Signal Process..

[34]  Li Xue Initial/final segmentation of Chinese whispered speech based on the auditory model , 2004 .

[35]  Rosângela Coelho,et al.  Time-Frequency Feature and AMS-GMM Mask for Acoustic Emotion Classification , 2014, IEEE Signal Processing Letters.

[36]  Duan Lei Research on I/F Segmentation in Continuous Mandarin Speech , 2005 .

[37]  Seiichi Nakagawa,et al.  Syllable recognition using syllable-segment statistics and syllable-based HMM , 2002, INTERSPEECH.

[38]  Hu Manli I/F segmentation for Chinese continuous speech based on vowel detection , 2011 .

[39]  Qu Dan Segmentation of Chinese initials and finals based on auditory event detection , 2010 .

[40]  Günther Ruske,et al.  Syllable segmentation of continuous speech with artificial neural networks , 1993, EUROSPEECH.