Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks
暂无分享,去创建一个
[1] Graham Neubig,et al. CTC Alignments Improve Autoregressive Translation , 2022, EACL.
[2] Shinji Watanabe,et al. Minimum latency training of sequence transducers for streaming end-to-end speech recognition , 2022, INTERSPEECH.
[3] Daniel Povey,et al. Pruned RNN-T for fast, memory-efficient ASR training , 2022, INTERSPEECH.
[4] M. Lewis,et al. LegoNN: Building Modular Encoder-Decoder Models , 2022, ArXiv.
[5] P. Bell,et al. Investigating Sequence-Level Normalisation For CTC-Like End-to-End ASR , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Ronan Collobert,et al. Star Temporal Classification: Sequence Classification with Partially Labeled Data , 2022, ArXiv.
[7] Lili Mou,et al. Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision , 2021, AAAI.
[8] Lei Xie,et al. WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Boris Ginsburg,et al. CTC Variations Through New WFST Topologies , 2021, INTERSPEECH.
[10] Hermann Ney,et al. Why does CTC result in peaky behavior? , 2021, ArXiv.
[11] Hung-yi Lee,et al. Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation , 2021, FINDINGS.
[12] Hasim Sak,et al. Reducing Streaming ASR Model Delay with Self Alignment , 2021, Interspeech.
[13] Shinji Watanabe,et al. Intermediate Loss Regularization for CTC-Based Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Mauro Cettolo,et al. CTC-based Compression for Direct Speech Translation , 2021, EACL.
[15] Lei Xie,et al. WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit , 2021, Interspeech.
[16] Jiatao Gu,et al. Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade , 2020, FINDINGS.
[17] Hang Su,et al. Alignment Restricted Streaming Recurrent Neural Network Transducer , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[18] Jonathan Le Roux,et al. Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Yu Wu,et al. Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] M. Seltzer,et al. Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Tara N. Sainath,et al. FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Weinan Zhang,et al. Glancing Transformer for Non-Autoregressive Neural Machine Translation , 2020, ACL.
[23] Jiajun Zhang,et al. Bridging the Modality Gap for Speech-to-Text Translation , 2020, ArXiv.
[24] Vineel Pratap,et al. Differentiable Weighted Finite-State Transducers , 2020, ArXiv.
[25] Hermann Ney,et al. A New Training Pipeline for an Improved Neural Transducer , 2020, INTERSPEECH.
[26] Tetsunori Kobayashi,et al. Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict , 2020, INTERSPEECH.
[27] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[28] Oscar Koller,et al. Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Peter Plantinga,et al. Towards Real-Time Mispronunciation Detection in Kids' Speech , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[30] Shuo Wang,et al. Dense Temporal Convolution Network for Sign Language Translation , 2019, IJCAI.
[31] Mattia Antonino Di Gangi,et al. MuST-C: a Multilingual Speech Translation Corpus , 2019, NAACL.
[32] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[33] Kartik Audhkhasi,et al. Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation , 2019, INTERSPEECH.
[34] Meng Wang,et al. Connectionist Temporal Fusion for Sign Language Translation , 2018, ACM Multimedia.
[35] John H. L. Hansen,et al. Advancing Multi-Accented Lstm-CTC Speech Recognition Using a Domain Specific Student-Teacher Learning Paradigm , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[36] Sanjeev Khudanpur,et al. End-to-end Speech Recognition Using Lattice-free MMI , 2018, INTERSPEECH.
[37] Hui Bu,et al. AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale , 2018, ArXiv.
[38] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[39] Shimon Whiteson,et al. TACO: Learning Task Decomposition via Temporal Alignment for Control , 2018, ICML.
[40] Olivier Pietquin,et al. End-to-End Automatic Speech Translation of Audiobooks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[42] Hao Zheng,et al. AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).
[43] Matt Shannon,et al. Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping , 2017, INTERSPEECH.
[44] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[45] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.
[46] Pavlo Molchanov,et al. Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[48] Tara N. Sainath,et al. Acoustic modelling with CD-CTC-SMBR LSTM RNNS , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[49] Hermann Ney,et al. Framewise and CTC training of Neural Networks for handwriting recognition , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).
[50] Ralph Roskies,et al. Bridges: a uniquely flexible HPC resource for new communities and data analytics , 2015, XSEDE.
[51] Johan Schalkwyk,et al. Learning acoustic frame labeling for speech recognition with recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[52] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[53] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[54] Nancy Wilkins-Diehr,et al. XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.
[55] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[56] Mauro Cettolo,et al. WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.
[57] T. Munich,et al. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.
[58] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.