Learning the decision function for speaker verification

Explores the possibility of replacing the usual thresholding decision rule of log likelihood ratios used in speaker verification systems by more complex and discriminant decision functions based for instance on linear regression models or support vector machines. Current speaker verification systems, based on generative models such as HMMs or Gaussian mixture models, can indeed easily be adapted to use such decision functions. Experiments on both text dependent and text independent tasks always yielded performance improvements and sometimes significantly.

[1]  Frédéric Bimbot,et al.  Speaker verification in the telephone network: research activities in the cave project , 1997, EUROSPEECH.

[2]  Dominique Genoud,et al.  A comparison of a priori threshold setting procedures for speaker verification in the CAVE project , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Gérard Chollet,et al.  Swiss French PolyPhone and PolyVar: telephone speech databases to model inter- and intra-speaker variability , 1996 .

[4]  C. Mokbel Synchronous Alignment , 1999 .

[5]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[6]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[7]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[8]  J. van Leeuwen,et al.  Audio- and Video-Based Biometric Person Authentication , 2001, Lecture Notes in Computer Science.

[9]  Sadaoki Furui,et al.  Robust methods of updating model and a priori threshold in speaker verification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.