Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods

This book discusses large margin and kernel methods for speech and speaker recognitionSpeech and Speaker Recognition: Large Margin and Kernel Methods is a collation of research in the recent advances in large margin and kernel methods, as applied to the field of speech and speaker recognition. It presents theoretical and practical foundations of these methods, from support vector machines to large margin methods for structured learning. It also provides examples of large margin based acoustic modelling for continuous speech recognizers, where the grounds for practical large margin sequence learning are set. Large margin methods for discriminative language modelling and text independent speaker verification are also addressed in this book.Key Features:Provides an up-to-date snapshot of the current state of research in this field Covers important aspects of extending the binary support vector machine to speech and speaker recognition applications Discusses large margin and kernel method algorithms for sequence prediction required for acoustic modeling Reviews past and present work on discriminative training of language models, and describes different large margin algorithms for the application of part-of-speech tagging Surveys recent work on the use of kernel approaches to text-independent speaker verification, and introduces the main concepts and algorithms Surveys recent work on kernel approaches to learning a similarity matrix from data This book will be of interest to researchers, practitioners, engineers, and scientists in speech processing and machine learning fields.

[1]  Edward L. Wilson,et al.  Numerical methods in finite element analysis , 1976 .

[2]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[3]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[4]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[5]  Guy L. Scott,et al.  Feature grouping by 'relocalisation' of eigenvectors of the proximity matrix , 1990, BMVC.

[6]  G. Golub,et al.  Tracking a few extreme singular values and vectors in signal processing , 1990, Proc. IEEE.

[7]  Michael L. Overton,et al.  Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices , 2015, Math. Program..

[8]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[9]  Martine D. F. Schlag,et al.  Spectral K-way ratio-cut partitioning and clustering , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[10]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[11]  Eytan Domany,et al.  Data Clustering Using a Model Granular Magnet , 1997, Neural Computation.

[12]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[14]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[15]  S. Mallat A wavelet tour of signal processing , 1998 .

[16]  Marina Meila,et al.  An Experimental Comparison of Several Clustering and Initialization Methods , 1998, UAI.

[17]  Daniel P. W. Ellis,et al.  Speech and Audio Signal Processing - Processing and Perception of Speech and Music, Second Edition , 1999 .

[18]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[19]  Jianbo Shi,et al.  Learning Segmentation by Random Walks , 2000, NIPS.

[20]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[21]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[22]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[23]  Özgür Yilmaz,et al.  Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[24]  Naftali Tishby,et al.  Data Clustering by Markovian Relaxation and the Information Bottleneck Method , 2000, NIPS.

[25]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[26]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[27]  Jianbo Shi,et al.  Grouping with Bias , 2001, NIPS.

[28]  Daniel P. W. Ellis,et al.  The auditory organization of speech and other sources in listeners and computational models , 2001, Speech Commun..

[29]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[30]  Jitendra Malik,et al.  Efficient spatiotemporal grouping using the Nystrom method , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[31]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[32]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[33]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[34]  Barak A. Pearlmutter,et al.  Blind Source Separation via Multinode Sparse Representation , 2001, NIPS.

[35]  Nello Cristianini,et al.  Spectral Kernel Methods for Clustering , 2001, NIPS.

[36]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[37]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[38]  Tomer Hertz,et al.  Learning and inferring image segmentations using the GBP typical cut algorithm , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[39]  Brendan J. Frey,et al.  Probabilistic Inference of Speech Signals from Phaseless Spectrograms , 2003, NIPS.

[40]  Te-Won Lee,et al.  A Maximum Likelihood Approach to Single-channel Source Separation , 2003, J. Mach. Learn. Res..

[41]  Dan Klein,et al.  Spectral Learning , 2003, IJCAI.

[42]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[43]  Ulrike von Luxburg,et al.  Limits of Spectral Clustering , 2004, NIPS.

[44]  E. Oja,et al.  Independent Component Analysis , 2001 .

[45]  Jianbo Shi,et al.  Learning spectral graph segmentation , 2005, AISTATS.

[46]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[47]  Michael I. Jordan,et al.  Learning Spectral Clustering, With Application To Speech Separation , 2006, J. Mach. Learn. Res..

[48]  Nello Cristianini,et al.  Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problem , 2006, J. Mach. Learn. Res..

[49]  D. Kandiyoti 1 introduction. , 2005, Journal of the ICRU.

[50]  Zaïd Harchaoui,et al.  DIFFRAC: a discriminative and flexible framework for clustering , 2007, NIPS.

[51]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..