暂无分享,去创建一个
Tomasz Trzcinski | Pawel Cyrta | Wojciech Stokowiec | T. Trzciński | Wojciech Stokowiec | Pawel Cyrta
[1] Gregory Gelly,et al. Improving Speaker Diarization of TV Series using Talking-Face Detection and Clustering , 2016, ACM Multimedia.
[2] P. Mermelstein,et al. Distance measures for speech recognition, psychological and instrumental , 1976 .
[3] Jordi Luque,et al. Short- and Long-Term Speech Features for Hybrid HMM-i-Vector based Speaker Diarization System , 2016, Odyssey.
[4] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[6] Sanjeev Khudanpur,et al. Acoustic Modelling from the Signal Domain Using CNNs , 2016, INTERSPEECH.
[7] R. Patterson,et al. B OF THE SVOS FINAL REPORT ( Part A : The Auditory Filterbank ) AN EFFICIENT AUDITORY FIL TERBANK BASED ON THE GAMMATONE FUNCTION , 2010 .
[8] Gang Wang,et al. Convolutional recurrent neural networks: Learning spatial dependencies for image representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[9] Andreas Stolcke,et al. The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[10] Delphine Charlet,et al. Speaker identification by location in an optimal space of anchor models , 2002, INTERSPEECH.
[11] Matthew Sharifi,et al. Large-scale speaker identification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Sanjeev Khudanpur,et al. Deep neural network-based speaker embeddings for end-to-end speaker verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[13] Tara N. Sainath,et al. Learning filter banks within a deep neural network framework , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[14] Ron J. Weiss,et al. Speech acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Nicholas W. D. Evans,et al. Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[16] Oliver Durr,et al. Speaker identification and clustering using convolutional neural networks , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).
[17] Hervé Bredin,et al. TristouNet: Triplet loss for speaker turn embedding , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Jean Carletta,et al. The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.
[19] Lei Wang,et al. Convolutional Recurrent Neural Networks for Text Classification , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).
[20] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[21] Ting Liu,et al. Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.
[22] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[23] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Jesse Engel,et al. Learning Multiscale Features Directly from Waveforms , 2016, INTERSPEECH.
[26] George Trigeorgis,et al. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Alan McCree,et al. Speaker diarization using deep neural network embeddings , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Daniel Garcia-Romero,et al. Speaker diarization with plda i-vector scoring and unsupervised calibration , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[29] Delphine Charlet,et al. Speaker diarization with unsupervised training framework , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[31] Tuomas Virtanen,et al. Convolutional recurrent neural networks for bird audio detection , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).
[32] Simon Dobrisek,et al. Incorporating Duration Information into I-Vector-Based Speaker Recognition Systems , 2014, Odyssey.
[33] Benjamin Schrauwen,et al. End-to-end learning for music audio , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Yi Liu,et al. Investigating Various Diarization Algorithms for Speaker in the Wild (SITW) Speaker Recognition Challenge , 2016, INTERSPEECH.
[35] Sree Harsha Yella,et al. Speaker diarization of spontaneous meeting room conversations , 2015 .
[36] Sylvain Meignier,et al. LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .
[37] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.
[38] Petr Motlícek,et al. System fusion and speaker linking for longitudinal diarization of TV shows , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Yan Song,et al. Improved i-Vector Representation for Speaker Diarization , 2016, Circuits Syst. Signal Process..
[40] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[41] Dimitri Palaz,et al. Analysis of CNN-based speech recognition system using raw speech as input , 2015, INTERSPEECH.
[42] Judith C. Brown. Calculation of a constant Q spectral transform , 1991 .
[43] Mickael Rouvier,et al. Speaker diarization through speaker embeddings , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).
[44] Themos Stafylakis,et al. I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).