Text-independent speaker identification using binary-pair partitioned neural networks

The N-way speaker identification task is partitioned into N*(N-1)/2 binary-pair classifications. The binary-pair classifications are performed with small neural nets, each trained to make independent binary decisions on small fragments of speech data. Three issues were investigated concerning optimally combining a large number of fragmentary binary decisions into a single N-way decision: (1) incorporating speech energy and phonetic content information to compute an improved probability measure at the individual speech frame level; (2) combining binary frame-level decisions into a binary segment-level decision; and (3) combining the binary segment-level decisions into a single N-way segment level decision. It was shown that the two-way classifiers can be combined to achieve 100% speaker identification performance for large speaker populations.<<ETX>>

[1]  Stephen A. Zahorian,et al.  Text-independent talker identification with neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Stephen A. Zahorian,et al.  A comparative study of spectral peaks versus global spectral shape as invariant acoustic cues for vowels , 1990 .