Feature Extraction Methods for Speaker Recognition: A Review
暂无分享,去创建一个
Smriti Srivastava | Gopal Chaudhary | Saurabh Bhardwaj | S. Srivastava | Gopal Chaudhary | Saurabh Bhardwaj
[1] Frank K. Soong,et al. On the use of instantaneous and transitional spectral information in speaker recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..
[2] Fred Cummins,et al. Speaker Identification Using Instantaneous Frequencies , 2008, IEEE Transactions on Audio, Speech, and Language Processing.
[3] Douglas A. Reynolds,et al. Fusing high- and low-level features for speaker recognition , 2003, INTERSPEECH.
[4] B. Atal,et al. Speech analysis and synthesis by linear prediction of the speech wave. , 1971, The Journal of the Acoustical Society of America.
[5] Richard M. Stern,et al. Features Based on Auditory Physiology and Perception , 2012, Techniques for Noise Robustness in Automatic Speech Recognition.
[6] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[7] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.
[8] Olli Viikki,et al. A recursive feature vector normalization approach for robust speech recognition in noise , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[9] E. Lenneberg. Biological Foundations of Language , 1967 .
[10] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[11] DeLiang Wang,et al. Robust speaker identification using auditory features and computational auditory scene analysis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[12] Paavo Alku,et al. On separating glottal source and vocal tract information in telephony speaker verification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[13] Qi Li,et al. An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[14] Qi Li,et al. An auditory-based transfrom for audio signal processing , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[15] Fan-Gang Zeng,et al. Encoding frequency Modulation to improve cochlear implant performance in noise , 2005, IEEE Transactions on Biomedical Engineering.
[16] A. E. Rosenberg,et al. Evaluation of an automatic speaker-verification system over telephone lines , 1976, The Bell System Technical Journal.
[17] John H. L. Hansen,et al. A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition , 2008, Speech Commun..
[18] Mark J. F. Gales,et al. Noisy Constrained Maximum-Likelihood Linear Regression for Noise-Robust Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[19] Douglas A. Reynolds,et al. Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..
[20] Petros Maragos,et al. Time-frequency distributions for automatic speech recognition , 2001, IEEE Trans. Speech Audio Process..
[21] John H. L. Hansen,et al. Nonlinear analysis and classification of speech under stressed conditions , 1994 .
[22] Alejandro Acero,et al. Acoustical and environmental robustness in automatic speech recognition , 1991 .
[23] K. Ohkura,et al. Speech recognition in a noisy environment using a noise reduction neural network and a codebook mapping technique , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.
[24] Tomi Kinnunen,et al. Spectral Features for Automatic Text-Independent Speaker Recognition , 2003 .
[25] Malayappan Shridhar,et al. Text-independent speaker recognition using orthogonal linear prediction , 1981, ICASSP.
[26] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..
[27] José L. Pérez-Córdoba,et al. Histogram equalization of speech representation for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.
[28] DeLiang Wang,et al. CASA-Based Robust Speaker Identification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[29] A. Hussain,et al. Nonlinear speech processing: Overview and applications , 2002 .
[30] Petros Maragos,et al. Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..
[31] Jen-Tzung Chien,et al. Aggregate a posteriori linear regression adaptation , 2006, IEEE Trans. Speech Audio Process..
[32] Kuldip K. Paliwal,et al. Usefulness of phase spectrum in human speech perception , 2003, INTERSPEECH.
[33] Jiqing Han,et al. Sparse-Based auditory Model for robust speaker Recognition , 2012, Int. J. Pattern Recognit. Artif. Intell..
[34] Geoffrey Zweig,et al. LATTICE-BASED UNSUPERVISED MLLR FOR SPEAKER ADAPTATION , 2000 .
[35] Hema A. Murthy. ALGORITHMS FOR PROCESSING FOURIER TRANSFORM PHASE OF SIGNALS , 1991 .
[36] Themos Stafylakis,et al. Combining amplitude and phase-based features for speaker verification with short duration utterances , 2015, INTERSPEECH.
[37] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .
[38] Wei Wang,et al. Speaker Verification via Modeling Kurtosis Using Sparse Coding , 2016, Int. J. Pattern Recognit. Artif. Intell..
[39] Elizabeth Shriberg,et al. Higher-Level Features in Speaker Recognition , 2007, Speaker Classification.
[40] Rajesh M. Hegde,et al. Significance of the Modified Group Delay Feature in Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[41] Kuldip K. Paliwal,et al. Frequency-related representation of speech , 2003, INTERSPEECH.
[42] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .
[43] Hong-Goo Kang,et al. Speaker recognition based on transformed line spectral frequencies , 2004, Proceedings of 2004 International Symposium on Intelligent Signal Processing and Communication Systems, 2004. ISPACS 2004..
[44] Haizhou Li,et al. Feature Normalization Using Structured Full Transforms for Robust Speech Recognition , 2011, INTERSPEECH.
[45] Eliathamby Ambikairajah,et al. FM features for automatic forensic speaker recognition , 2008, INTERSPEECH.
[46] John G. Harris,et al. Human factor cepstral coefficients , 2002 .
[47] Pedro J. Moreno,et al. Speech recognition in noisy environments , 1996 .
[48] Rhee Man Kil,et al. Auditory processing of speech signals for robust speech recognition in real-world noisy environments , 1999, IEEE Trans. Speech Audio Process..
[49] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..
[50] Abeer Alwan,et al. A model of dynamic auditory perception and its application to robust word recognition , 1997, IEEE Trans. Speech Audio Process..
[51] Amita Pal,et al. Speaker Identification by Aggregating Gaussian Mixture Models (GMMs) Based on Uncorrelated MFCC-derived Features , 2014, Int. J. Pattern Recognit. Artif. Intell..
[52] Douglas E. Sturim,et al. The MIT lincoln laboratory 2008 speaker recognition system , 2009, INTERSPEECH.
[53] Mark J. F. Gales,et al. Cepstral parameter compensation for HMM recognition in noise , 1993, Speech Commun..
[54] Kuldip K. Paliwal,et al. Speech Coding and Synthesis , 1995 .
[55] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..
[56] Hermann Ney,et al. Using phase spectrum information for improved speech recognition performance , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[57] J. Allen,et al. Nonlinear phenomena as observed in the ear canal and at the auditory nerve. , 1985, The Journal of the Acoustical Society of America.
[58] Kuldip K. Paliwal,et al. Short-time phase spectrum in speech processing: A review and some experimental results , 2007, Digit. Signal Process..
[59] Gang Li,et al. Signal representation based on instantaneous amplitude models with application to speech synthesis , 2000, IEEE Trans. Speech Audio Process..
[60] A. Hudspeth,et al. Essential nonlinearities in hearing. , 2000, Physical review letters.
[61] Judith Markowitz. The Many Roles of Speaker Classification in Speaker Verification and Identification , 2007, Speaker Classification.
[62] Denis Jouvet,et al. Evaluation of a noise-robust DSR front-end on Aurora databases , 2002, INTERSPEECH.
[63] Douglas E. Sturim,et al. Classification Methods for Speaker Recognition , 2007, Speaker Classification.
[64] James R. Glass,et al. Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[65] Bayya Yegnanarayana,et al. Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.
[66] A. Marchal,et al. Speech production and speech modelling , 1990 .
[67] Qiang Huo,et al. A Study of Minimum Classification Error (MCE) Linear Regression for Supervised Adaptation of MCE-Trained Continuous-Density Hidden Markov Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[68] B. Yegnanarayana,et al. Significance of group delay functions in signal reconstruction from spectral magnitude or phase , 1984 .
[69] B. Moore,et al. Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.
[70] Chin-Hui Lee,et al. Structural maximum a posteriori linear regression for fast HMM adaptation , 2002, Comput. Speech Lang..
[71] Yifan Gong,et al. High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).
[72] Laurent Mauuary,et al. Blind equalization for robust telephone based speech recognition , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).
[73] H. M. Teager,et al. Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .
[74] Tim Jürgens,et al. NOISE ROBUST DISTANT AUTOMATIC SPEECH RECOGNITION UTILIZING NMF BASED SOURCE SEPARATION AND AUDITORY FEATURE EXTRACTION , 2013 .
[75] Shantanu Chakrabartty,et al. Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[76] David R. Cole,et al. Speaker recognition in reverberant enclosures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[77] Robert B. Dunn,et al. Speech enhancement based on auditory spectral change , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[78] Mark J. F. Gales,et al. Unsupervised Adaptation With Discriminative Mapping Transforms , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[79] Chin-Hui Lee,et al. A structural Bayes approach to speaker adaptation , 2001, IEEE Trans. Speech Audio Process..
[80] Andreas Stolcke,et al. The ICSI-SRI Spring 2006 Meeting Recognition System , 2006, MLMI.
[81] Richard M. Stern,et al. Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[82] Rong Tong,et al. The I4U system in NIST 2008 speaker recognition evaluation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[83] Abeer Alwan,et al. Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR , 2005, IEEE Transactions on Speech and Audio Processing.
[84] Nikos Fakotakis,et al. Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .
[85] Lin-Shan Lee,et al. Higher Order Cepstral Moment Normalization for Improved Robust Speech Recognition , 2009, IEEE Trans. Speech Audio Process..
[86] S. Molau,et al. Feature space normalization in adverse acoustic conditions , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[87] A. Gray,et al. Long-term feature averaging for speaker recognition , 1977 .
[88] Wai Nang Chan,et al. Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[89] Alfred Mertins,et al. Contextual invariant-integration features for improved speaker-independent speech recognition , 2011, Speech Commun..
[90] Dimitrios Dimitriadis,et al. Spectral Moment Features Augmented by Low Order Cepstral Coefficients for Robust ASR , 2010, IEEE Signal Processing Letters.
[91] Douglas A. Reynolds,et al. Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..
[92] DeLiang Wang,et al. Robust Speaker Identification in Noisy and Reverberant Conditions , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[93] S. Furui,et al. Cepstral analysis technique for automatic speaker verification , 1981 .
[94] Günther Palm,et al. On the use of residual cepstrum in speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[95] DeLiang Wang,et al. A computational auditory scene analysis system for speech segregation and robust speech recognition , 2010, Comput. Speech Lang..
[96] K. Takaya,et al. Recognition of syllables in a continuous stream of speech by PARCOR parameters of linear predictive vocoder , 2005, Canadian Conference on Electrical and Computer Engineering, 2005..
[97] K. I. Ramachandran,et al. Towards improving the performance of text/language independent speaker recognition systems , 2014, 2014 International Conference on Power Signals Control and Computations (EPSCICON).
[98] B. Atal. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.
[99] Fan-Gang Zeng,et al. Speech recognition with amplitude and frequency modulations. , 2005, Proceedings of the National Academy of Sciences of the United States of America.
[100] Jan Van der Spiegel,et al. Robust auditory-based speech processing using the average localized synchrony detection , 2002, IEEE Trans. Speech Audio Process..
[101] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[102] Abeer Alwan,et al. Evaluation of noise robust features on the Aurora databases , 2002, INTERSPEECH.
[103] Jeff A. Bilmes,et al. MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[104] Yan Ming Cheng,et al. SNR-dependent waveform processing for improving the robustness of ASR front-end , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[105] Chin-Hui Lee,et al. Joint maximum a posteriori adaptation of transformation and HMM parameters , 2001, IEEE Trans. Speech Audio Process..
[106] Kuldip K. Paliwal,et al. Product of power spectrum and group delay function for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[107] Tomi Kinnunen,et al. Joint Acoustic-Modulation Frequency for Speaker Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[108] N. Kitaoka. Speech recognition under noisy environments using spectral subtraction with smoothing of time direction and real-time cepstral mean normalization , 2001 .
[109] Mark D Skowronski,et al. Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition. , 2004, The Journal of the Acoustical Society of America.
[110] G. Miet,et al. Speech enhancement via frequency bandwidth extension using line spectral frequencies , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[111] Dong Jin Seo,et al. An approach to robust unsupervised speaker adaptation , 2005, IEEE Signal Process. Lett..
[112] William M. Campbell,et al. Speaker Verification Using Support Vector Machines and High-Level Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[113] Jan Cernocký,et al. Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[114] Frank K. Soong,et al. An auditory system-based feature for robust speech recognition , 2001, INTERSPEECH.
[115] Hynek Hermansky,et al. Temporal patterns (TRAPs) in ASR of noisy speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[116] Douglas D. O'Shaughnessy,et al. Multitaper MFCC and PLP features for speaker verification using i-vectors , 2013, Speech Commun..
[117] Douglas A. Reynolds,et al. The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[118] Abeer Alwan,et al. On the use of variable frame rate analysis in speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[119] E. Ambikairajah,et al. Extraction of FM components from speech signals using all-pole model , 2008 .
[120] Seung Ho Choi,et al. Cepstrum third-order normalisation method for noisy speech recognition , 1999 .
[121] Daniel P. W. Ellis,et al. Feature extraction using non-linear transformation for robust speech recognition on the Aurora database , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[122] Qiang Fu,et al. Robust Glottal Source Estimation Based on Joint Source-Filter Model Optimization , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[123] E. E. David,et al. Artificial Auditory Recognition in Telephony , 1958, IBM J. Res. Dev..
[124] Haizhou Li,et al. An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..
[125] Bayya Yegnanarayana,et al. Significance of group delay functions in spectrum estimation , 1992, IEEE Trans. Signal Process..
[126] Hema A. Murthy,et al. The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[127] P. Maragos,et al. Speech formant frequency and bandwidth tracking using multiband energy demodulation , 1996 .
[128] Birger Kollmeier,et al. Amplitude modulation spectrogram based features for robust speech recognition in noisy and reverberant environments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[129] Petros Maragos,et al. Robust AM-FM features for speech recognition , 2005, IEEE Signal Processing Letters.
[130] Yifan Gong,et al. HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.