A Study on Joint Modeling and Data Augmentation of Multi-Modalities for Audio-Visual Scene Classification
暂无分享,去创建一个
Sabato Marco Siniscalchi | Chin-Hui Lee | Jun Du | C. Yang | Qing Wang | Yuzhong Wu | Siyuan Zheng | Yunqing Li | Yajian Wang | Hu Hu | Yannan Wang | S. Siniscalchi
[1] Chin-Hui Lee,et al. A Two-Stage Approach to Device-Robust Acoustic Scene Classification , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] A. Mesaros,et al. A Curated Dataset of Urban Scenes for Audio-Visual Scene Analysis , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Annamaria Mesaros,et al. Acoustic Scene Classification in DCASE 2020 Challenge: Generalization Across Devices and Low Complexity Solutions , 2020, DCASE.
[4] Chongruo Wu,et al. ResNeSt: Split-Attention Networks , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[5] Mark D. Plumbley,et al. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[6] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[7] Guisong Xia,et al. A Multiple-Instance Densely-Connected ConvNet for Aerial Scene Classification , 2019, IEEE Transactions on Image Processing.
[8] Jixin Liu,et al. Fusing Object Semantics and Deep Appearance Features for Scene Recognition , 2019, IEEE Transactions on Circuits and Systems for Video Technology.
[9] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[10] Tuomas Virtanen,et al. A multi-device dataset for urban acoustic scene classification , 2018, DCASE.
[11] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] Yang Liu,et al. Dictionary Learning Inspired Deep Network for Scene Recognition , 2018, AAAI.
[13] Ankit Shah,et al. DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.
[14] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[15] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[16] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[17] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Mohammed Bennamoun,et al. A Spatial Layout and Scale Invariant Feature Representation for Indoor Scene Classification , 2015, IEEE Transactions on Image Processing.
[22] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[23] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[24] Shoou-I Yu,et al. Multimedia classification and event detection using double fusion , 2014, Multimedia Tools and Applications.
[25] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[26] Louis-Philippe Morency,et al. Modeling Latent Discriminative Dynamic of Multi-dimensional Affective Signals , 2011, ACII.
[27] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[28] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[29] Loïc Kessous,et al. Emotion Recognition through Multiple Modalities: Face, Body Gesture, Speech , 2008, Affect and Emotion in Human-Computer Interaction.
[30] Author. $article.title , 2002, Nature.
[31] Anahid N. Jalali,et al. DCASE 2021 Task 1 B : Technique Report , 2021 .
[32] ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2021 .
[33] Pengyuan Zhang,et al. AUDIO-VISUAL SCENE CLASSIFICATION USING TRANSFER LEARNING AND HYBRID FUSION STRATEGY Technical Report , 2021 .
[34] Soichiro Okazaki. LDSLVISION SUBMISSIONS TO DCASE’21: A MULTI-MODAL FUSION APPROACH FOR AUDIO-VISUAL SCENE CLASSIFICATION ENHANCED BY CLIP VARIANTS Technical Report , 2021 .
[35] Tomoaki Yoshinaga,et al. A Multi-Modal Fusion Approach for Audio-Visual Scene Classification Enhanced by CLIP Variants , 2021, DCASE.
[36] Daniele Battaglino,et al. Acoustic scene classification using convolutional neural networks , 2016 .
[37] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.