A logistic regression model for detecting prominences

The paper describes the development of a model for identifying points of prominence in speech. This model can be used as a first step in intonational labeling of corpora that are used in some speech synthesis systems (A. Black and P. Taylor, 1995). The working definition of prominence is that starred ToBI accents (K. Silverman et al., 1992), that is, H*, L*, L*+H, L+H*, and H+IH*, are prominent. The prominence detection model developed here is based on the sums of products vowel duration model (J.P.H. van Santen, 1992). The model was trained and tested on different portions of the Boston University Radio News corpus and achieves accuracy results of 86.3% correct identification with 12.52 false detection. The results are comparable to those of previous work (C.W. Wightman and W.N. Campbell, 1995): 85.9% correct identification with 10.7% false detection. The advantage of this model is that it can be trained quickly on as few as 600 data points, reducing the need for large corpora.