Tone recognition of continuous speech of standard Chinese using neural network and tone nucleus model

A method is developed for recognizing lexical tone types of Standard Chinese syllables in continuous speech. Neural network (four-layered perceptron) is adopted as classifier. The method includes two steps; first recognizing tone types using prosodic features of voiced part, and then re-recognizing by viewing only on tone nucleus, which is a portion of the syllable showing rather stable fundamental frequency (F0) contour regardless of tone types of the preceding and following syllables. The voiced part (or tone nucleus) is divided into 20 segments, and F0, delta-F0, F0 slope and short-term energy of each segment are served as inputs to the neural network. In order to cope with tone coarticulation, prosodic feature parameters for the last 5 segments of the preceding syllable and the initial 5 segments of the following syllable are included in the neural network inputs. Information on syllable length is also added to the inputs. Tone recognition experiment was conducted for a female speaker's utterances included in HKU96 corpus. The average recognition rate was 86.5 % including neutral tone syllables, when the tone nucleus model was not used. It increased to 86.9 %, when the model was used. The obtained rate is higher by more than 3 points as compared to that obtained by the hidden-Markov-model-based tone recognizer developed by the authors formerly. Index Terms: tone recognition, tone nucleus model, neural network, Standard Chinese