论文信息 - JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features

JTAV: Jointly Learning Social Media Content Representation by Fusing Textual, Acoustic, and Visual Features

Learning social media content is the basis of many real-world applications, including information retrieval and recommendation systems, among others. In contrast with previous works that focus mainly on single modal or bi-modal learning, we propose to learn social media content by fusing jointly textual, acoustic, and visual information (JTAV). Effective strategies are proposed to extract fine-grained features of each modality, that is, attBiGRU and DCRNN. We also introduce cross-modal fusion and attentive pooling techniques to integrate multi-modal information comprehensively. Extensive experimental evaluation conducted on real-world datasets demonstrates our proposed model outperforms the state-of-the-art approaches by a large margin.

[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2] Xavier Serra,et al. Multi-Label Music Genre Classification from Audio, Text and Images Using Deep Features , 2017, ISMIR.

[3] Yongjian Wu,et al. Fusing transcription results from polyphonic and monophonic audio for singing melody transcription in polyphonic music , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] Venkata Rama Kiran Garimella,et al. Social Media Image Analysis for Public Health , 2015, CHI.

[5] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[6] Diyi Yang,et al. Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[7] Yaoxue Zhang,et al. Mobile Contextual Recommender System for Online Social Media , 2017, IEEE Transactions on Mobile Computing.

[8] Eric Gilbert,et al. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[9] György Fazekas,et al. A Tutorial on Deep Learning for Music Information Retrieval , 2017, ArXiv.

[10] Timothy Baldwin,et al. Lexical normalization for social media text , 2013, TIST.

[11] Gilad Mishne,et al. Finding high-quality content in social media , 2008, WSDM '08.

[12] Lifeng Sun,et al. Social Media Recommendation , 2013, Social Media Retrieval.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Yunhong Wang,et al. Visual and textual sentiment analysis using deep fusion convolutional neural networks , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[15] Judith C. Brown,et al. An efficient algorithm for the calculation of a constant Q transform , 1992 .

[16] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[17] Aren Jansen,et al. Towards Learning Semantic Audio Representations from Unlabeled Data , 2017 .

[18] Xuelong Li,et al. Image2song: Song Retrieval via Bridging Image Content and Lyric Words , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19] Baoxin Li,et al. Unsupervised Sentiment Analysis for Social Media Images , 2015, IJCAI.

[20] Mike Thelwall,et al. Sensing Social Media: A Range of Approaches for Sentiment Analysis , 2017 .

[21] Woobin Im,et al. Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation , 2016, ArXiv.

[22] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Paul Rad,et al. A deep learning approach for mapping music genres , 2017, 2017 12th System of Systems Engineering Conference (SoSE).

[24] Lei Wang,et al. Transfer Learning for Music Classification and Regression Tasks Using Artist Tags , 2020 .

[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Alexandros Tsaptsinos. Lyrics-Based Music Genre Classification Using a Hierarchical Attention Network , 2017, ISMIR.

[27] Boualem Boashash,et al. Time frequency signal analysis: Past, present and future trends , 1996 .