Word recognition by means of orthogonal functions

This paper describes experiments in which word recognition is based on comparing the projections of input words on an orthogonal basis with those of a stored library of words. An initial orthogonal basis is determined from the generalized spectrum of short time segments selected from a vocabulary of ten words. The initial basis is optimized by minimizing the complementary error energy. By projecting a spoken word onto the optimum orthogonal basis, a sequence of numbers is generated to represent the word. By correlating the absolute values of the sequence with those of a stored library of words, the spoken word is identified. The percent of correct recognition varies from 71.6 to 96.6 percent for two speakers. Techniques are developed to improve the recognition scores and to reduce the lengthy computer processing time and large storage requirement. First a master template is made for each word by averaging six templates for the particular word. For one speaker the percent of correct recognition increases to 100 percent when incoming words are compared against the master templates. For a second speaker, the recognition rates improve significantly and vary between 93 and 98 percent when the master templates are used. To further improve the recognition process, the feasibility of grouping words into several classes is demonstrated. The classifications are based on the locations of formant regions and the time durations of each spoken word.

[1]  J. Wiren,et al.  Electronic Binary Selection System for Phoneme Classification , 1956 .

[2]  R. Purton Speech recognition using autocorrelation analysis , 1968 .

[3]  Harold J. Manley Analysis‐Synthesis of Connected Speech in Terms of Orthogonalized Exponentially Damped Sinusoids , 1962 .

[4]  L. O'Neill The representation of continuous speech with a periodically sampled orthogonal basis , 1969 .

[5]  M. V. Valkenburg Network Analysis , 1964 .

[6]  W. Huggins,et al.  'Complementary' Signals and Orthogonalized Exponentials , 1962 .

[7]  J. S. Koford,et al.  Real‐Time Adaptive Speech‐Recognition System , 1963 .

[8]  J. H. King,et al.  Some experiments in spoken word recognition , 1966 .

[9]  Jr. M. Cannon A method of analysis and recognition for voiced vowels , 1968 .

[10]  J. Forgie,et al.  Results Obtained from a Vowel Recognition Computer Program , 1959 .

[11]  A. Samuel,et al.  Whither speech recognition? , 1969, The Journal of the Acoustical Society of America.

[12]  J E DAMMANN APPLICATION OF ADAPTIVE THRESHOLD ELEMENTS TO THE RECOGNITION OF ACOUSTIC-PHONETIC STATES. , 1965, The Journal of the Acoustical Society of America.

[13]  L. Dolansky Choice of base signals in speech signal analysis , 1960 .

[14]  M. Clark Optimization of the representation of sampled data signals on orthonormal bases , 1968 .

[15]  J. Shearme,et al.  Some experiments with a simple word recognition system , 1968 .

[16]  R. Singleton,et al.  Spectral analysis of the call of the male killer whale , 1967, IEEE Transactions on Audio and Electroacoustics.