暂无分享,去创建一个
Wenwu Wang | Tom Ko | Jinzheng Zhao | Mark D. Plumbley | Yusong Wu | Qiushi Huang | H Lilian Tang | Xubo Liu | Shengchen Li | Xinhao Mei | Gengyun Chen | Jingqian Wu | Xi Shao | Yusong Wu | Shengchen Li | Xingkun Shao | Wenwu Wang | Xubo Liu | Xinhao Mei | MarkD . Plumbley | Tom Ko | Jinzheng Zhao | Qiushi Huang | H. Tang | Gengyun Chen | Jingqian Wu | M. Plumbley
[1] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[3] Kun Chen,et al. Audio Captioning Based on Transformer and Pre-Trained CNN , 2020, DCASE.
[4] Tuomas Virtanen,et al. Automated audio captioning with recurrent neural networks , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[5] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[6] Kai Yu,et al. Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Zhiyuan Liu,et al. Pre-Trained Models: Past, Present and Future , 2021, AI Open.
[8] Tuomas Virtanen,et al. Clotho: an Audio Captioning Dataset , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Gunhee Kim,et al. AudioCaps: Generating Captions for Audios in The Wild , 2019, NAACL.
[11] Siqi Liu,et al. Improved Image Captioning via Policy Gradient optimization of SPIDEr , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[12] Ryo Masumura,et al. A Transformer-based Audio Captioning Model with Keyword Estimation , 2020, INTERSPEECH.
[13] Tuomas Virtanen,et al. Temporal Sub-sampling of Audio Feature Sequences for Automated Audio Captioning , 2020, DCASE.
[14] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[16] Mark D. Plumbley,et al. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[17] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[19] Kai Yu,et al. A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning , 2020, DCASE.
[20] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.
[22] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[23] Tuomas Virtanen,et al. WaveTransformer: An Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information , 2020, 2021 29th European Signal Processing Conference (EUSIPCO).
[24] Kunio Kashino,et al. Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning , 2020, DCASE.