暂无分享,去创建一个
Titouan Parcollet | Yoshua Bengio | Renato De Mori | Mirco Ravanelli | Cem Subakan | Samuele Cornell | Szu-Wei Fu | Aku Rouhe | Jianyuan Zhong | Hwidong Na | Ju-Chieh Chou | Elena Rastorgueva | Loren Lugosch | Peter Plantinga | Sung-Lin Yeh | Nauman Dawalatabad | Chien-Feng Liao | Abdelwahab Heba | Franccois Grondin | Yoshua Bengio | William Aris | Yan Gao | R. Mori | Titouan Parcollet | M. Ravanelli | Szu-Wei Fu | Chien-Feng Liao | Hwidong Na | P. Plantinga | Aku Rouhe | Samuele Cornell | Loren Lugosch | Cem Subakan | Nauman Dawalatabad | A. Heba | Jianyuan Zhong | Ju-Chieh Chou | Sung-Lin Yeh | Elena Rastorgueva | Franccois Grondin | William Aris | Yan Gao | Peter William VanHarn Plantinga | E. Rastorgueva
[1] G. Carter,et al. The generalized correlation method for estimation of time delay , 1976 .
[2] R. O. Schmidt,et al. Multiple emitter location and signal Parameter estimation , 1986 .
[3] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..
[4] Maurizio Omologo,et al. Acoustic event localization using a crosspower-spectrum phase based technique , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[5] Steve Young,et al. The HTK book , 1995 .
[6] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[7] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[8] Kiyohiro Shikano,et al. Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.
[9] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..
[10] Jean Carletta,et al. The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.
[11] David Garlan,et al. Design fragments make using frameworks easier , 2006, OOPSLA '06.
[12] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[13] Alexander I. Rudnicky,et al. Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[14] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[15] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..
[16] Yi Hu,et al. Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.
[17] Emanuel A. P. Habets,et al. New Insights Into the MVDR Beamformer in Room Acoustics , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[18] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[19] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[20] Maximo Cobos,et al. A Modified SRP-PHAT Functional for Robust Real-Time Sound Source Localization With Scalable Spatial Sampling , 2011, IEEE Signal Processing Letters.
[21] Hermann Ney,et al. RASR - The RWTH Aachen University Open Source Speech Recognition Toolkit , 2011 .
[22] Daniel Garcia-Romero,et al. Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.
[23] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[24] Simon King,et al. The voice bank corpus: Design, collection and data analysis of a large regional accent speech database , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).
[25] Themos Stafylakis,et al. PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[26] Nobutaka Ito,et al. The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings , 2013 .
[27] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[28] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[29] Jun Du,et al. Robust speech recognition with speech enhanced deep neural networks , 2014, INTERSPEECH.
[30] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[31] John R. Hershey,et al. Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks , 2015, INTERSPEECH.
[32] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Jun Du,et al. Joint training of front-end and back-end deep neural networks for robust speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[35] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Reinhold Häb-Umbach,et al. Neural network based spectral mask estimation for acoustic beamforming , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Yoshua Bengio,et al. Batch-normalized joint training for DNN-based distant speech recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[38] Jonathan Le Roux,et al. Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks , 2016, INTERSPEECH.
[39] Kong-Aik Lee,et al. An extensible speaker identification sidekit in Python , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Zhong-Qiu Wang,et al. A Joint Training Framework for Robust Automatic Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[41] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[42] Satoshi Nakamura,et al. An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation , 2017, NMT@ACL.
[43] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[44] Hao Zheng,et al. AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).
[45] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[46] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[47] Cassia Valentini-Botinhao,et al. Noisy speech database for training speech enhancement algorithms and TTS models , 2017 .
[48] Tom Bagby,et al. End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow , 2017, INTERSPEECH.
[49] Hermann Ney,et al. RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition , 2018, ACL.
[50] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[51] Yoshua Bengio,et al. Light Gated Recurrent Units for Speech Recognition , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.
[52] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[53] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[54] Sandeep Subramanian,et al. Deep Complex Networks , 2017, ICLR.
[55] Yongqiang Wang,et al. Towards End-to-end Spoken Language Understanding , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[57] Arun Narayanan,et al. From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[58] Yannick Estève,et al. TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation , 2018, SPECOM.
[59] Tara N. Sainath,et al. A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[60] Eric Fosler-Lussier,et al. Spectral Feature Mapping with MIMIC Loss for Robust Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[61] Nanyun Peng,et al. Espresso: A Fast End-to-End Neural Speech Recognition Toolkit , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[62] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[63] Nima Mesgarani,et al. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[64] Yoshua Bengio,et al. Speech Model Pre-training for End-to-End Spoken Language Understanding , 2019, INTERSPEECH.
[65] Titouan Parcollet,et al. Quaternion Recurrent Neural Networks , 2018, ICLR.
[66] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[67] Boris Ginsburg,et al. NeMo: a toolkit for building AI applications using Neural Modules , 2019, ArXiv.
[68] Shuai Wang,et al. BUT System Description to VoxCeleb Speaker Recognition Challenge 2019 , 2019, ArXiv.
[69] Jonathan Le Roux,et al. WHAM!: Extending Speech Separation to Noisy Environments , 2019, INTERSPEECH.
[70] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[71] Boris Ginsburg,et al. Jasper: An End-to-End Convolutional Neural Acoustic Model , 2019, INTERSPEECH.
[72] Titouan Parcollet,et al. The Pytorch-kaldi Speech Recognition Toolkit , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[73] François Michaud,et al. Lightweight and Optimized Sound Source Localization and Tracking Methods for Open and Closed Microphone Array Configurations , 2018, Robotics Auton. Syst..
[74] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[75] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[76] Jonathan Vincent,et al. GEV Beamforming Supported by DOA-Based Masks Generated on Pairs of Microphones , 2020, INTERSPEECH.
[77] Verena Rieser,et al. SLURP: A Spoken Language Understanding Resource Package , 2020, EMNLP.
[78] Gabriel Synnaeve,et al. Real Time Speech Enhancement in the Waveform Domain , 2020, INTERSPEECH.
[79] Yonghui Wu,et al. ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context , 2020, INTERSPEECH.
[80] Luk'avs Burget,et al. Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks , 2020, Comput. Speech Lang..
[81] Jonathan Le Roux,et al. WHAMR!: Noisy and Reverberant Single-Channel Speech Separation , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[82] Johannes Gehrke,et al. The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results , 2020, INTERSPEECH.
[83] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[84] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2019, LREC.
[85] Efthymios Tzinis,et al. Asteroid: the PyTorch-based audio source separation toolkit for researchers , 2020, INTERSPEECH.
[86] Antoine Deleforge,et al. LibriMix: An Open-Source Dataset for Generalizable Speech Separation , 2020, 2005.11262.
[87] Pavel Korshunov,et al. Pyannote.Audio: Neural Building Blocks for Speaker Diarization , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[88] Jeremy Howard,et al. fastai: A Layered API for Deep Learning , 2020, Inf..
[89] Kris Demuynck,et al. ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification , 2020, INTERSPEECH.
[90] Boris Ginsburg,et al. Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[91] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[92] Takuya Yoshioka,et al. Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[93] Mikolaj Morzy,et al. WER we are and WER we think we are , 2020, FINDINGS.
[94] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.
[95] Abdel-rahman Mohamed,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[96] Boris Ginsburg,et al. Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition , 2021, 2104.01721.
[97] M. Ravanelli,et al. ECAPA-TDNN Embeddings for Speaker Diarization , 2021, Interspeech.
[98] Mirco Ravanelli,et al. Attention Is All You Need In Speech Separation , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[99] Zhuo Chen,et al. ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[100] Gabriel Synnaeve,et al. Rethinking Evaluation in ASR: Are Our Models Robust Enough? , 2020, Interspeech.
[101] Bowon Lee,et al. Integration of Pre-Trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[102] Yu Tsao,et al. MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement , 2021, Interspeech.
[103] Emmanuel Dupoux,et al. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation , 2021, ACL.
[104] Titouan Parcollet,et al. Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers , 2021, NeurIPS Datasets and Benchmarks.
[105] Andy T. Liu,et al. SUPERB: Speech processing Universal PERformance Benchmark , 2021, Interspeech.
[106] Shrikanth Narayanan,et al. Meta-Learning With Latent Space Clustering in Generative Adversarial Network for Speaker Diarization , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.