End-to-End Image-to-Speech Generation for Untranscribed Unknown Languages
暂无分享,去创建一个
[1] Michael C. Frank,et al. Unsupervised word discovery from speech using automatic segmentation into syllable-like units , 2015, INTERSPEECH.
[2] Satoshi Nakamura,et al. Unsupervised Linear Discriminant Analysis for Supporting DPGMM Clustering in the Zero Resource Scenario , 2016, SLTU.
[3] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Florian Metze,et al. Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the “Speaking Rosetta” JSALT 2017 Workshop , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Sanjeev Khudanpur,et al. Unsupervised Learning of Acoustic Sub-word Units , 2008, ACL.
[7] David A. van Leeuwen,et al. Unsupervised acoustic sub-word unit detection for query-by-example spoken term detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.
[9] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[10] Satoshi Nakamura,et al. Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge , 2020, INTERSPEECH.
[11] Aren Jansen,et al. The zero resource speech challenge 2017 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[12] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.
[14] Sakriani Sakti,et al. The Zero Resource Speech Challenge 2019: TTS without T , 2019, INTERSPEECH.
[15] Khalil Sima'an,et al. Multi30K: Multilingual English-German Image Descriptions , 2016, VL@ACL.
[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[17] M. Kramer. Nonlinear principal component analysis using autoassociative neural networks , 1991 .
[18] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[19] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[20] Haizhou Li,et al. VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019 , 2019, INTERSPEECH.
[21] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[22] Sakriani Sakti,et al. The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units , 2020, INTERSPEECH.
[23] James R. Glass,et al. Towards multi-speaker unsupervised speech pattern discovery , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[24] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[25] Satoshi Nakamura,et al. Speech-to-Speech Translation Between Untranscribed Unknown Languages , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[26] Mark Hasegawa-Johnson,et al. Image 2 speech : Automatically generating audio descriptions of images , 2017 .
[27] Aren Jansen,et al. Efficient spoken term discovery using randomized algorithms , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[28] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[29] James R. Glass,et al. Deep multimodal semantic embeddings for speech and images , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[30] James R. Glass,et al. Text-Free Image-to-Speech Synthesis Using Learned Segmental Units , 2020, ACL.
[31] Cyrus Rashtchian,et al. Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.
[32] Mark Hasegawa-Johnson,et al. Evaluating Automatically Generated Phoneme Captions for Images , 2020, INTERSPEECH.
[33] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[34] Satoshi Nakamura,et al. End-to-End Speech Recognition Sequence Training With Reinforcement Learning , 2019, IEEE Access.
[35] Sebastian Stüker,et al. Breaking the Unwritten Language Barrier: The BULB Project , 2016, SLTU.
[36] Daniel McDuff,et al. Unpaired Image-to-Speech Synthesis With Multimodal Information Bottleneck , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[37] Lukás Burget,et al. Variational Inference for Acoustic Unit Discovery , 2016, Workshop on Spoken Language Technologies for Under-resourced Languages.