Distinctive phonetic feature (DPF) based phone segmentation using hybrid neural networks

Segmentation of speech into its corresponding phones has become very important issue in many speech processing areas such as speech recognition, speech analysis, speech synthesis, and speech database. In this paper, for accurate segmentation in speech recognition applications, we introduce Distinctive Phonetic Feature (DPF) based feature extraction using a twostage NN (Neural Networks) system consists of a RNN (Recurrent Neural Network) in the first stage and an MLN (Multi-Layer Neural Network) in the second stage. The RNN maps continuous acoustic features, Local Feature (LF), onto discrete DPF patterns, while the MLN constraints DPF context or dynamics in an utterance. The experiments are carried out using JNAS (Japanese Newspaper Article Sentences) continuous utterances that contains vowels and consonants. The proposed DPF based feature extractor provides good segmentation and high recognition rate with a reduced mixture-set of HMMs (Hidden Markov Models) by resolving co-articulation effect.