Persian speech sentence segmentation without speech recognition

In this paper, we propose a method for detection of Persian speech sentence boundaries using a set of prosodic features and spectral centroid. No speech recognizer is used in our proposed method. Silent regions are first detected using four features including spectral centroid, zero crossing rate, energy and pitch. Then, twelve prosodic features are extracted from each silent region. Silent regions may correspond to a sentence boundary or other regions inside a sentence. Features of Silence regions of speech data from some speakers are extracted and labeled as silence in the boundary or inside the sentences. These feature vectors and a nonlinear support vector machine (SVM) classifier, is trained and then evaluated for detection of Persian speech sentence boundaries. The proposed algorithm was evaluated on six speakers from Large FARSDAT data set. A performance of 82.4% F-measure was achieved on test set from all speakers in training data and 73.02% F-measure on speakers outside the training data.