暂无分享,去创建一个
[1] Akane Sano,et al. Multi-task , Multi-Kernel Learning for Estimating Individual Wellbeing , 2015 .
[2] Alexander Kotov,et al. Deep Neural Architecture for Multi-Modal Retrieval based on Joint Embedding Space for Text and Images , 2018, WSDM.
[3] Marcus Rohrbach,et al. Multimodal Video Description , 2016, ACM Multimedia.
[4] John A. Bateman,et al. Text and Image , 2014 .
[5] Ralph Ewerth,et al. Estimating the Information Gap between Textual and Visual Representations , 2017, ICMR.
[6] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[8] Marilyn Domas White,et al. A taxonomy of relationships between images and text , 2003, J. Documentation.
[9] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[10] Ronggang Wang,et al. Cross-media Retrieval by Learning Rich Semantic Embeddings of Multimedia , 2017, ACM Multimedia.
[11] Jean Maillard,et al. Black Holes and White Rabbits: Metaphor Identification with Visual Features , 2016, NAACL.
[12] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[13] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[14] Yu-Chiang Frank Wang,et al. A Novel Multiple Kernel Learning Framework for Heterogeneous Feature Fusion and Variable Selection , 2012, IEEE Transactions on Multimedia.
[15] Xu Jia,et al. Guiding the Long-Short Term Memory Model for Image Caption Generation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[16] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[17] Qin Jin,et al. Video Description Generation using Audio and Visual Cues , 2016, ICMR.
[18] Ning Ma,et al. Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Andrew Salway,et al. A system for image–text relations in new (and old) media , 2005 .
[21] Ethem Alpaydin,et al. Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..
[22] Shengcai Liao,et al. Cross-Modal Similarity Learning: A Low Rank Bilinear Formulation , 2014, CIKM.
[23] Erik Cambria,et al. Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.
[24] Jianping Yin,et al. Multiple Kernel Learning in the Primal for Multimodal Alzheimer’s Disease Classification , 2013, IEEE Journal of Biomedical and Health Informatics.
[25] Matthieu Cord,et al. Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings , 2018, SIGIR.
[26] Rong Jin,et al. Multiple Kernel Learning for Visual Object Recognition: A Review , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27] Dong Cao,et al. Self-Paced Cross-Modal Subspace Matching , 2016, SIGIR.
[28] Roland Göcke,et al. Extending Long Short-Term Memory for Multi-View Structured Learning , 2016, ECCV.
[29] Ina Blümel,et al. Figures in Scientific Open Access Publications , 2018, TPDL.
[30] R. Barthes,et al. Image-Music-Text , 1977 .
[31] Diyi Yang,et al. Hierarchical Attention Networks for Document Classification , 2016, NAACL.
[32] Christian Wolf,et al. ModDrop: Adaptive Multi-Modal Gesture Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..
[33] Christian M. I. M. Matthiessen,et al. Halliday's Introduction to Functional Grammar , 2014 .
[34] Len Unsworth,et al. IMAGE/TEXT RELATIONS AND INTERSEMIOSIS: TOWARDS MULTIMODAL TEXT DESCRIPTION FOR MULTILITERACIES EDUCATION , 2006 .
[35] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[36] Zi Huang,et al. Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval , 2016, IEEE Transactions on Multimedia.