A speech labeling system based on knowledge processing

The segmentation of continuous speech into phonemes, i.e., labeling, is one of the most important procedures in constructing the speech database. At the present stage, the labeling is executed by the inspection of the time-series of acoustic parameters by the human expert. Because of this procedure, a large amount of time and effort are required in the construction of the speech database. From such a viewpoint, the authors have developed an automatic speech labeling system aiming at the automatic labeling. In the proposed system, a correspondence is established between the phoneme symbol sequence and the characteristic changes in the time-course of the acoustic parameters, and the rough position of the phonemes are determined. Then the boundaries between the phonemes are adjusted by the detailed observation of the change of acoustic parameters near the phoneme boundaries. Finally, the change of the acoustic parameters in each phoneme interval is examined to verify the match of the label. The system is evaluated by experiment. Eight sentences uttered by seven adult males, who participated in the development of rules, are used for evaluation; 6.7 percent of the phoneme boundaries are rejected. For the remaining phoneme boundaries, 99.1 percent of the boundaries are set within 30 ms of the position decided previously by the human expert. For 10 sentences uttered by four adult males, who did not participate in the rule development, the foregoing scores turned out to be 8.9 and 95.5 percent, respectively. Those results indicate the usefulness of the proposed system.