Features and measures for speaker recognition

Scope and method of study. This work derives and demonstrates new and powerful features and measures for automatic speaker recognition and compares them with traditional ones. Automatic speaker recognition is the use of a machine to recognize a person from a spoken phrase. Speaker recognition systems can identify a particular person or verify a person's claimed identity. The scope of this study is limited to speech collected from cooperative users in office environments and without adverse microphone or channel impairments. The success of these systems depends directly upon the power of the features and measures used to discriminate among people. The focus of this research is to discover powerful features and measures for speaker verification. After a thorough literature review, concepts were synthesized from such diverse fields as signal processing, information theory, pattern recognition, physiology, and speech production and perception. The most promising innovations were then compared analytically and by computer simulation. Findings and conclusions. New perceptually based features were found which, unfortunately, did not outperform traditional speech production features with respect to speaker identification errors. Powerful new production features and measures for speaker verification were discovered. The main contribution is a new information theoretic shape measure between line spectrum pair (LSP) frequency features. This new measure, the divergence shape, can be interpreted geometrically as the shape of an information theoretic measure called divergence. The LSPs were found to be very effective features in this divergence shape measure. The experimental results show this combination yields 0.05% speaker identification error, which is superior by over an order of magnitude to the performance of any other claim reported in the literature.