Prediction of Major Phrase Boundary Location and Pause Insertion Using a Stochastic Context-free Grammar

In this paper, we present models for predicting major phrase boundary location and pause insertion using a stochastic context-free grammar (SCFG) from an input part of speech (POS) sequence. These prediction models were made with similar ideas as both major phrase boundary location and pause insertion have similar characteristics. In these models, word attributes and left/right-branching probability parameters representing stochastic phrasing characteristics are used as input parameters of a feed-forward neural network for the prediction. To obtain the probabilities, first, major phrase characteristics and pause characteristics are learned through the SCFG training using the inside-outside algorithm. Then, the probabilities of each bracketing structure are computed using the SCFG. Experiments were carried out to confirm the effectiveness of these stochastic models for the prediction of major phrase boundary locations and pause locations. In a test predicting major phrase boundaries with unseen data, 92.9% of the major phrase boundaries were correctly predicted with a 16.9% false insertion rate. For pause prediction with unseen data, 85.2% of the pause boundaries were correctly predicted with a 9.1% false insertion rate.