论文信息 - Persian speech sentence segmentation without speech recognition

Persian speech sentence segmentation without speech recognition

In this paper, we propose a method for detection of Persian speech sentence boundaries using a set of prosodic features and spectral centroid. No speech recognizer is used in our proposed method. Silent regions are first detected using four features including spectral centroid, zero crossing rate, energy and pitch. Then, twelve prosodic features are extracted from each silent region. Silent regions may correspond to a sentence boundary or other regions inside a sentence. Features of Silence regions of speech data from some speakers are extracted and labeled as silence in the boundary or inside the sentences. These feature vectors and a nonlinear support vector machine (SVM) classifier, is trained and then evaluated for detection of Persian speech sentence boundaries. The proposed algorithm was evaluated on six speakers from Large FARSDAT data set. A performance of 82.4% F-measure was achieved on test set from all speakers in training data and 73.02% F-measure on speakers outside the training data.

Mohammad Mehdi Homayounpour | Hoda Sadat Jafari

[1] James R. Glass,et al. Sentence Detection Using Multiple Annotations , 2012, INTERSPEECH.

[2] Dilek Z. Hakkani-Tür,et al. Cross-linguistic analysis of prosodic features for sentence segmentation , 2007, INTERSPEECH.

[3] Gökhan Tür,et al. Multi-View Semi-Supervised Learning for Dialog Act Segmentation of Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4] Jáchym Kolář,et al. Automatic Segmentation of Speech into Sentence-like Units , 2008 .

[5] Lie Lu,et al. Speech segmentation without speech recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[6] Θεόδωρος Γιαννακόπουλος,et al. Study and application of acoustic information for the detection of harmful content and fusion with visual information , 2009 .

[7] Ding Liu,et al. Utterance Segmentation Using Combined Approach Based on Bi-directional N-gram and Maximum Entropy , 2003, SIGHAN.

[8] Andreas Stolcke,et al. Using Conditional Random Fields for Sentence Boundary Detection in Speech , 2005, ACL.

[9] Lei Xie,et al. Prosody-based sentence boundary detection in Chinese broadcast news , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.

[10] Gökhan Tür,et al. Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..