The most straightforward way to compare the performance of speech recognizers is to test them with an identical data base. An agreement on such data base to be used as a standard could be difficult to reach. An alternative is to agree on a set of algorithms (a reference system) and compare each system to this reference. Furthermore, the reference is used to quantify the difficulty of the test sets. Differences in performance between the system under test and the reference will be meaningful to the speech community if the reference system is made widely available. The complete specifications (in FORTRAN) of a set of speech analysis and pattern discrimination algorithms are proposed here for this purpose. This recognizer uses dynamic time warping to optimize the match between unknown and reference utterances. Every utterance is coded as a sequence of vectors of cepstral coefficients. These coefficients are obtained from a short time power spectrum expressed on a mel frequency scale.
[1]
Robert Gary Goodman.
Analysis of languages for man-machine voice communication
,
1976
.
[2]
James H. Clark,et al.
A formalization of performance specifications for discrete utterance recognition systems
,
1981,
ICASSP.
[3]
Roger K. Moore.
Evaluating speech recognizers
,
1977
.
[4]
C. Gagnoulet,et al.
Seraphine: a connected word recognition system
,
1982
.
[5]
Stan Davis,et al.
Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se
,
1980
.
[6]
S. Chiba,et al.
Dynamic programming algorithm optimization for spoken word recognition
,
1978
.