Neural network boundary refining for automatic speech segmentation

This work is an extension of a previous work in which an automatic speech segmentation and labeling system was proposed based on a hidden Markov model (HMM) speech recognizer followed by a fuzzy-logic boundary correction system. In this paper we explore the possibility of substituting that difficult to design fuzzy-logic system by a neural network (NN) based system that can be automatically trained. First, the whole fuzzy-logic boundary correction system, which used different rule sets for each kind of phonetic transition, has been substituted by a single NN. Results show that this single NN outperforms the complete fuzzy-logic system. Then, the possibility of using different NNs specialized in each kind of phonetic transition has been explored. Results are again clearly better than the results obtained with the fuzzy-logic system, but not clearly better than the results obtained with just one NN.