Speech/non-speech segments detection based on chaotic and prosodic features

Every speech recognition system contains a speech/nonspeech detection stage. Detected speech sequences are only passed through the speech recognition stage later on. In a noisy environment, non-speech segments can be an important source of error. In this work, we introduce a new speech/nonspeech detection system based on fractal dimension and prosodic features plus the common used MFCC features. We evaluated our system performance using neural network and SVM classifiers on TIMIT speech database with a HMM based speech recognizer. Experimental results show very good performance in speech/non-speech detection.