Adaptive recognition of different accents conversations based on convolutional neural network

[1]  Jean-Luc Gauvain,et al.  Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Lin Wu,et al.  Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval , 2017, IEEE Transactions on Image Processing.

[3]  Joachim Diederich,et al.  Accent Classification Using Support Vector Machines , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[4]  F. Kubala,et al.  Automatic Speaker Clustering , 1997 .

[5]  M. A. Siegler,et al.  Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[6]  Lin Wu,et al.  Robust Subspace Clustering for Multi-View Data by Exploiting Correlation Consensus , 2015, IEEE Transactions on Image Processing.

[7]  D. Reddy,et al.  Performance of an expert spectrogram reader , 1978 .

[8]  Melvyn C. Reznick Dialect Zones and Automatic Dialect Identification in Latin American Spanish. , 1969 .

[9]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[10]  Gu Mingliang,et al.  Semi-supervised learning based Chinese dialect identification , 2008, 2008 9th International Conference on Signal Processing.

[11]  Xue Li,et al.  Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition , 2019, IEEE Transactions on Cybernetics.

[12]  Alan McCree,et al.  Speaker diarization with i-vectors from DNN senone posteriors , 2015, INTERSPEECH.

[13]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[14]  E. B. Newman,et al.  A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .

[15]  F. Karray,et al.  Speaker Accent Classification System Using a Fuzzy Gaussian Classifier , 2007, 2007 International Conference on Information and Emerging Technologies.

[16]  Lin Wu,et al.  What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification , 2017, Pattern Recognit..

[17]  Yang Wang,et al.  Structured Deep Hashing with Convolutional Neural Networks for Fast Person Re-identification , 2017, Comput. Vis. Image Underst..

[18]  S. Speer,et al.  Intonation and sentence processing , 2003 .

[19]  Lori Lamel,et al.  An expert spectrogram reader: A knowledge-based approach to speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  John H. L. Hansen,et al.  Language accent classification in American English , 1996, Speech Commun..

[21]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[22]  Mauro Cettolo,et al.  Evaluation of BIC-based algorithms for audio segmentation , 2005, Comput. Speech Lang..

[23]  Melvyn C. Resnick Phonological Variants and Dialect Identification in Latin American Spanish , 1980 .

[24]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[25]  Ara Samouelian,et al.  Knowledge Based Approach To Speech Recognition , 1994 .

[26]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[27]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[28]  Tomi Kinnunen,et al.  A New Segmentation Algorithm Combined with Transient Frames Power for Text Independent Speaker Verification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[29]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.

[30]  Christian Wellekens,et al.  A speaker tracking system based on speaker turn detection for NIST evaluation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[31]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[32]  Xue Li,et al.  A Combined Feature Approach for Speaker Segmentation Using Convolution Neural Network , 2017, PCM.

[33]  Wooil Kim,et al.  Speech Recognition Accuracy Prediction Using Speech Quality Measure , 2016 .

[34]  Herbert Gish,et al.  Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[35]  Lin Wu,et al.  Iterative Views Agreement: An Iterative Low-Rank Based Structured Optimization Method to Multi-View Spectral Clustering , 2016, IJCAI.

[36]  Lin Wu,et al.  Deep adaptive feature embedding with local sample distributions for person re-identification , 2017, Pattern Recognit..

[37]  Ramesh A. Gopinath,et al.  Transcription Of Broadcast News Shows With The Ibm Large Vocabulary Speech Recognition System , 1997 .

[38]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Lin Wu,et al.  Multiview Spectral Clustering via Structured Low-Rank Matrix Factorization , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Hervé Bourlard,et al.  Robust speaker change detection , 2004, IEEE Signal Processing Letters.

[41]  Noureddine Ellouze,et al.  Robust audio speaker segmentation using one class SVMS , 2008, 2008 16th European Signal Processing Conference.