暂无分享,去创建一个
Mark D. Plumbley | Wenwu Wang | Qiushi Huang | Xubo Liu | Xinhao Mei | Wenwu Wang | Xubo Liu | Xinhao Mei | Qiushi Huang
[1] Mark D. Plumbley,et al. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[2] Wenwu Wang,et al. An Encoder-Decoder Based Audio Captioning System with Transfer and Reinforcement Learning , 2021, DCASE.
[3] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[4] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Tuomas Virtanen,et al. Clotho: an Audio Captioning Dataset , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Tuomas Virtanen,et al. WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information , 2020, 2021 29th European Signal Processing Conference (EUSIPCO).
[9] Kunio Kashino,et al. Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning , 2020, DCASE.
[10] Kai Yu,et al. Audio Caption: Listen and Tell , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] James Glass,et al. PSLA: Improving Audio Event Classification with Pretraining, Sampling, Labeling, and Aggregation , 2021, ArXiv.
[12] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[14] Ryo Masumura,et al. A Transformer-based Audio Captioning Model with Keyword Estimation , 2020, INTERSPEECH.
[15] Kunio Kashino,et al. Neural Audio Captioning Based on Conditional Sequence-to-Sequence Model , 2019, DCASE.
[16] Kai Yu,et al. A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning , 2020, DCASE.
[17] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[18] Mark D. Plumbley,et al. Weakly Labelled AudioSet Tagging With Attention Neural Networks , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[19] Wei Liu,et al. CPTR: Full Transformer Network for Image Captioning , 2021, ArXiv.
[20] Masahiro Yasuda,et al. Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval , 2020, ArXiv.
[21] Kai Yu,et al. Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Gunhee Kim,et al. AudioCaps: Generating Captions for Audios in The Wild , 2019, NAACL.
[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[24] Kun Chen,et al. Audio Captioning Based on Transformer and Pre-Trained CNN , 2020, DCASE.
[25] Tuomas Virtanen,et al. Automated audio captioning with recurrent neural networks , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).