In speech analysis, the voiced-unvoiced decision is usually performed in conjunction with pitch analysis. The linking of voiced- unvoiced (V-UV) decision to pitch analysis not only results in unneces- sary complexity, but makes it difficult to classify short speech segments which are less than a few pitch periods in duration. In this paper, we describe a pattern recognition approach for deciding whether a given segment of a speech signal should be classified as voiced speech, un- voiced speech, or silence, based on measurements made on the signal. In this method, five different measurements are made on the speech segment to be classified. The measured parameters are the zero-crossing rate, the speech energy, the correlation between adjacent speech samples, the first predictor coefficient from a 12-pole linear predictive coding (LPC) analysis, and the energy in the prediction error. The speech segment is assigned to a particular class based on a minimum- distance rule obtained under the assumption that the measured param- eters are distributed according to the multidimensional Gaussian prob- ability density function. The means and covariances for the Gaussian distribution are determined from manually classified speech data in- cluded in a training set. The method has been found to provide reliable classification with speech segments as short as 10 ms and has been used for both speech analysis-synthesis and recognition applications. A simple nonlinear smoothing algorithm is described to provide a smooth 3-level contour of an utterance for use in speech recognition applica- tions. Quantitative results and several examples illustrating the per- formance of the method are included in paper.
[1]
Solomon Kullback,et al.
Information Theory and Statistics
,
1970,
The Mathematical Gazette.
[2]
J. Markel,et al.
The SIFT algorithm for fundamental frequency estimation
,
1972
.
[3]
M. Sondhi,et al.
New methods of pitch extraction
,
1968
.
[4]
A. Gray,et al.
A spectral-flatness measure for studying the autocorrelation method of linear prediction of speech analysis
,
1974
.
[5]
Thomas Marill,et al.
On the effectiveness of receptors in recognition systems
,
1963,
IEEE Trans. Inf. Theory.
[6]
Solomon Kullback,et al.
Information Theory and Statistics
,
1960
.
[7]
A. Oppenheim,et al.
Homomorphic analysis of speech
,
1968
.
[8]
B. Atal.
Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification.
,
1974,
The Journal of the Acoustical Society of America.
[9]
B. Atal,et al.
Speech analysis and synthesis by linear prediction of the speech wave.
,
1971,
The Journal of the Acoustical Society of America.
[10]
A. Noll.
Cepstrum pitch determination.
,
1967,
The Journal of the Acoustical Society of America.
[11]
Lawrence R. Rabiner,et al.
Some preliminary experiments in the recognition of connected digits
,
1975
.
[12]
E. Patrick,et al.
Fundamentals of Pattern Recognition
,
1973
.