Textually Pretrained Speech Language Models
暂无分享,去创建一个
Roy Schwartz | Alexis Conneau | Tu Anh Nguyen | Roy Schwartz | Gabriel Synnaeve | Yossi Adi | Tal Remez | Michael Hassid | Felix Kreuk | E. Dupoux | Itai Gat | Alexandre Défossez | Jade Copet | Tu Nguyen | Alexis Conneau | Felix Kreuk
[1] Oskar van der Wal,et al. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , 2023, ArXiv.
[2] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[3] O. Pietquin,et al. SingSong: Generating musical accompaniments from singing , 2023, ArXiv.
[4] Timo I. Denk,et al. MusicLM: Generating Music From Text , 2023, ArXiv.
[5] W. Freeman,et al. Muse: Text-To-Image Generation via Masked Generative Transformers , 2023, ICML.
[6] Yossi Adi,et al. Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling , 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] B. Ramabhadran,et al. Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR , 2022, 2022 IEEE Spoken Language Technology Workshop (SLT).
[8] Benoît Sagot,et al. Generative Spoken Dialogue Language Modeling , 2022, TACL.
[9] Yossi Adi,et al. ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement , 2022, ArXiv.
[10] Ankur Bapna,et al. Mu2SLAM: Multitask, Multilingual Speech and Language Models , 2022, ArXiv.
[11] Yossi Adi,et al. Speaking Style Conversion With Discrete Self-Supervised Units , 2022, ArXiv.
[12] Jong Wook Kim,et al. Robust Speech Recognition via Large-Scale Weak Supervision , 2022, ICML.
[13] Alexander M. Rush,et al. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model , 2022, ArXiv.
[14] Gabriel Synnaeve,et al. High Fidelity Neural Audio Compression , 2022, ArXiv.
[15] Yaniv Taigman,et al. AudioGen: Textually Guided Audio Generation , 2022, ICLR.
[16] David Grangier,et al. AudioLM: A Language Modeling Approach to Audio Generation , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[17] Akhilesh Deepak Gotmare,et al. CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning , 2022, NeurIPS.
[18] Jing Yu Koh,et al. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation , 2022, Trans. Mach. Learn. Res..
[19] Akshat Gupta. On Building Spoken Language Understanding Systems for Low Resourced Languages , 2022, SIGMORPHON.
[20] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.
[21] Yossi Adi,et al. Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation , 2022, INTERSPEECH.
[22] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[23] Lisa Anne Hendricks,et al. Training Compute-Optimal Large Language Models , 2022, ArXiv.
[24] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[25] Abdel-rahman Mohamed,et al. textless-lib: a Library for Textless Spoken Language Processing , 2022, NAACL.
[26] Ankur Bapna,et al. mSLAM: Massively multilingual joint pre-training for speech and text , 2022, ArXiv.
[27] H. Schwenk,et al. Textless Speech-to-Speech Translation on Real Data , 2021, NAACL.
[28] Rui Wang,et al. SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing , 2021, ACL.
[29] Abdel-rahman Mohamed,et al. Text-Free Prosody-Aware Generative Spoken Language Modeling , 2021, ACL.
[30] A. Polyak,et al. Direct Speech-to-Speech Translation With Discrete Units , 2021, ACL.
[31] Hung-yi Lee,et al. Recent Advances in Pre-trained Language Models: Why Do They Work and How Do They Work , 2022, AACL.
[32] S. Savarese,et al. A Conversational Paradigm for Program Synthesis , 2022, ArXiv.
[33] Marco Tagliasacchi,et al. SoundStream: An End-to-End Neural Audio Codec , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[34] Vijay Janapa Reddi,et al. The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage , 2021, NeurIPS Datasets and Benchmarks.
[35] Ankur Bapna,et al. SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training , 2021, ArXiv.
[36] Adam Polyak,et al. fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit , 2021, EMNLP.
[37] Chung-Cheng Chiu,et al. w2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[38] Benjamin van Niekerk,et al. Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing , 2021, Interspeech.
[39] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[40] Eugene Kharitonov,et al. Speech Resynthesis from Discrete Disentangled Self-Supervised Representations , 2021, Interspeech.
[41] Libo Qin,et al. A Survey on Spoken Language Understanding: Recent Advances and New Frontiers , 2021, IJCAI.
[42] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[43] Emmanuel Dupoux,et al. On Generative Spoken Language Modeling from Raw Audio , 2021, Transactions of the Association for Computational Linguistics.
[44] Emmanuel Dupoux,et al. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation , 2021, ACL.
[45] Tie-Yan Liu,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2020, ICLR.
[46] Eugene Kharitonov,et al. Textless Speech Emotion Conversion using Decomposed and Discrete Representations , 2021, ArXiv.
[47] Ewan Dunbar,et al. The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling , 2020, ArXiv.
[48] Verena Rieser,et al. SLURP: A Spoken Language Understanding Resource Package , 2020, EMNLP.
[49] Gabriel Synnaeve,et al. MLS: A Large-Scale Multilingual Dataset for Speech Research , 2020, INTERSPEECH.
[50] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[51] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[52] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[53] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[54] Satoshi Nakamura,et al. Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge , 2020, INTERSPEECH.
[55] Abdel-rahman Mohamed,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[57] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[58] Haizhou Li,et al. VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019 , 2019, INTERSPEECH.
[59] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[60] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[61] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[62] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[63] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[64] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[65] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[66] Nathanael Chambers,et al. A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories , 2016, ArXiv.
[67] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[68] David Vandyke,et al. Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.
[69] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[70] Cha Zhang,et al. CROWDMOS: An approach for crowdsourcing mean opinion score studies , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[71] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[72] Thorsten Brants,et al. Large Language Models in Machine Translation , 2007, EMNLP.
[73] Lalit R. Bahl,et al. A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[74] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .