An improved speech detection algorithm for isolated Korean utterances

A new, simple speech detection algorithm is implemented and applied to isolated Korean utterances with encouraging results. The algorithm makes use of decision parameters and threshold values based on three features: the logarithmic energy, the zero crossing rate, and the modified zero crossing rate. All threshold values are fixed through the training procedure by applying a discrete optimization technique for the prepared training data set. The vocabulary for the experiment includes Korean digits and some other control commands designed for an automatic dialing system. Tested on 384 utterances from three male and three female speakers, the algorithm produces 4.6 ms average error while the well-known Rabiner and Sambur's endpoint detection algorithm gives 18.0 ms average error for the same speaker-independent data set. It is also shown that the new algorithm can be improved by training for male and female separately.<<ETX>>