Neural net approaches to speaker verification: comparison with second order statistic measures

The non-supervised self organizing map of Kohonen (SOM), the supervised learning vector quantization algorithm (LVQ3), and a method based on second-order statistical measures (SOSM) were adapted, evaluated and compared for speaker verification on 57 speakers of a POLYPHONE-like data base. The SOM and LVQ3 were trained by codebooks with 32 and 256 codes and two statistical measures; one without weighting (SOSM1) and another with weighting (SOSM2) were implemented. As the decision criterion, the equal error rate (EER) and best match decision rule (BMDR) were employed and evaluated. The weighted linear predictive cepstrum coefficients (LPCC) and the /spl Delta/LPCC were used jointly as two kinds of spectral speech representations in a single vector as distinctive features. The LVQ3 demonstrates a performance advantage over SOM. This is due to the fact that the LVQ3 allows the long-term fine-tuning of an interested target codebook using speech data from a client and other speakers, whereas the SOM only uses data from the client. The SOSM performs better than the SOM and the LVQ3 for long test utterances, while for short test utterances the LVQ is the best method among the methods studied.