Multimedia Intelligence: When Multimedia Meets Artificial Intelligence
暂无分享,去创建一个
[1] Davar Pishva. Spectroscopically Enhanced Method and System for Multi-Factor Biometric Authentication , 2008, IEICE Trans. Inf. Syst..
[2] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.
[3] Cordelia Schmid,et al. Weakly-Supervised Alignment of Video with Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[4] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[5] Neil Martin Robertson,et al. Deep Head Pose: Gaze-Direction Estimation in Multimodal Video , 2015, IEEE Transactions on Multimedia.
[6] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[7] Kevin P. Murphy,et al. A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[8] Lin Ma,et al. Temporally Grounding Natural Sentence in Video , 2018, EMNLP.
[9] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.
[10] Kevin Murphy,et al. What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision , 2015, NAACL.
[11] Christos Faloutsos,et al. Efficient Similarity Search In Sequence Databases , 1993, FODO.
[12] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.
[13] Qing Yang,et al. HMOG: New Behavioral Biometric Features for Continuous Authentication of Smartphone Users , 2015, IEEE Transactions on Information Forensics and Security.
[14] Luc De Raedt,et al. DeepProbLog: Neural Probabilistic Logic Programming , 2018, BNAIC/BENELEARN.
[15] Tao Mei,et al. Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Tao Mei,et al. To Create What You Tell: Generating Videos from Captions , 2017, ACM Multimedia.
[18] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] B.P. Yuhas,et al. Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.
[20] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.
[21] Graham W. Taylor,et al. Deep Multimodal Learning: A Survey on Recent Advances and Trends , 2017, IEEE Signal Processing Magazine.
[22] Song-Chun Zhu,et al. A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[23] Dacheng Tao,et al. Robust Face Recognition via Multimodal Deep Face Representation , 2015, IEEE Transactions on Multimedia.
[24] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[25] Liang Lin,et al. Visual Question Reasoning on General Dependency Tree , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[26] Pushpak Bhattacharyya,et al. Everybody loves a rich cousin: An empirical study of transliteration through bridge languages , 2010, NAACL.
[27] Christopher D. Manning,et al. GQA: a new dataset for compositional question answering over real-world images , 2019, ArXiv.
[28] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[29] Christopher D. Manning,et al. Compositional Attention Networks for Machine Reasoning , 2018, ICLR.
[30] Christian Wolf,et al. ModDrop: Adaptive Multi-Modal Gesture Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..
[31] Hwee Tou Ng,et al. Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages , 2012, J. Artif. Intell. Res..
[32] Md. Zakir Hossain,et al. A Comprehensive Survey of Deep Learning for Image Captioning , 2018, ACM Comput. Surv..
[33] Shenghua Gao,et al. Multiview Multitask Gaze Estimation With Deep Convolutional Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[34] Wenwu Zhu,et al. Two decades of internet video streaming: A retrospective view , 2013, TOMCCAP.
[35] Tao Mei,et al. Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Chuang Gan,et al. Watch, Reason and Code: Learning to Represent Videos Using Program , 2019, ACM Multimedia.
[37] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[38] Xin Wang,et al. Recommending Groups to Users Using User-Group Engagement and Time-Dependent Matrix Factorization , 2016, AAAI.
[39] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.
[40] Xin Wang,et al. Learning Personalized Preference of Strong and Weak Ties for Social Recommendation , 2017, WWW.
[41] Vladimir Pavlovic,et al. Boosted learning in dynamic Bayesian networks for multimodal speaker detection , 2003, Proc. IEEE.
[42] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[43] Pavel Zezula,et al. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.
[44] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[45] Xiaodan Liang,et al. Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Rob Miller,et al. VizWiz: nearly real-time answers to visual questions , 2010, UIST.
[47] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.
[48] Xinlei Chen,et al. Iterative Visual Reasoning Beyond Convolutions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[49] Larry S. Davis,et al. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[51] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[52] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Matthieu Cord,et al. MUREL: Multimodal Relational Reasoning for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Nuno Vasconcelos,et al. Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.
[55] Björn W. Schuller,et al. LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework , 2013, Image Vis. Comput..
[56] Wei Xu,et al. Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Nitish Srivastava,et al. Learning Representations for Multimodal Data with Deep Belief Nets , 2012 .
[58] Nasser Kehtarnavaz,et al. UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).
[59] Jing Zhang,et al. MirrorGAN: Learning Text-To-Image Generation by Redescription , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Rajarshi Das,et al. Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering , 2019, ICLR.
[62] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[63] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[64] Tao Mei,et al. Video Captioning with Transferred Semantic Attributes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[65] Christopher Joseph Pal,et al. EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.
[66] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Wenwu Zhu,et al. Deep Multimodal Hashing with Orthogonal Regularization , 2015, IJCAI.
[68] Xin Wang,et al. Visual Query Answering by Entity-Attribute Graph Matching and Reasoning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[69] Ming Shao,et al. A Multi-stream Bi-directional Recurrent Neural Network for Fine-Grained Action Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[70] Ramakant Nevatia,et al. TALL: Temporal Activity Localization via Language Query , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[71] Yong Rui,et al. Image search—from thousands to billions in 20 years , 2013, TOMCCAP.
[72] Matthieu Cord,et al. Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval , 2009, J. Electronic Imaging.
[73] Yejin Choi,et al. Collective Generation of Natural Image Descriptions , 2012, ACL.
[74] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.
[75] Svetlana Lazebnik,et al. Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering , 2018, NeurIPS.
[76] Balaraman Ravindran,et al. Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning , 2015, NAACL.
[77] Sergio Escalera,et al. ChaLearn Looking at People Challenge 2014: Dataset and Results , 2014, ECCV Workshops.
[78] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[79] Marco Baroni,et al. Grounding Distributional Semantics in the Visual World , 2016, Lang. Linguistics Compass.
[80] Wei Wang,et al. A Comprehensive Survey on Cross-modal Retrieval , 2016, ArXiv.
[81] Ruslan Salakhutdinov,et al. Generating Images from Captions with Attention , 2015, ICLR.
[82] Benjamin Schrauwen,et al. Deep content-based music recommendation , 2013, NIPS.
[83] Ole Winther,et al. Recurrent Relational Networks , 2017, NeurIPS.
[84] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[85] Jingdong Wang,et al. Deeply-Learned Part-Aligned Representations for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[86] Tao Mei,et al. To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression , 2018, AAAI.
[87] Xuan Dong,et al. Chain of Reasoning for Visual Question Answering , 2018, NeurIPS.
[88] Ling Shao,et al. Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[89] Shuang Wu,et al. Multimodal feature fusion for robust event detection in web videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[90] Li Fei-Fei,et al. Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[91] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[92] Chuang Gan,et al. Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.
[93] Ruzena Bajcsy,et al. Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).
[94] Xin Wang,et al. Semi-supervised Deep Quantization for Cross-modal Search , 2019, ACM Multimedia.
[95] Lin Ma,et al. Multimodal Convolutional Neural Networks for Matching Image and Sentence , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[96] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.
[97] Ethem Alpaydin,et al. Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..
[98] Xin Wang,et al. Cross-Modal Dual Learning for Sentence-to-Video Generation , 2019, ACM Multimedia.
[99] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[100] Xin Wang,et al. Multi-Modal Deep Analysis for Multimedia , 2019, IEEE Transactions on Circuits and Systems for Video Technology.
[101] Chuang Gan,et al. Weakly Supervised Dense Event Captioning in Videos , 2018, NeurIPS.
[102] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[103] Jean-Philippe Thiran,et al. Dynamic modality weighting for multi-stream hmms inaudio-visual speech recognition , 2008, ICMI '08.
[104] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[105] Xin Wang,et al. Perceptual Visual Reasoning with Knowledge Propagation , 2019, ACM Multimedia.
[106] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[107] Shih-Fu Chang,et al. Grounding Referring Expressions in Images by Variational Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[108] Tao Mei,et al. Video Summarization by Learning Deep Side Semantic Embedding , 2019, IEEE Transactions on Circuits and Systems for Video Technology.
[109] Lin Ma,et al. Multimodal learning for facial expression recognition , 2015, Pattern Recognit..
[110] David Mascharka,et al. Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[111] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[112] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[113] Yale Song,et al. TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[114] Jun Wang,et al. Multi-Agent Reinforcement Learning , 2020, Deep Reinforcement Learning.
[115] Haoqi Fan,et al. Stacked Latent Attention for Multimodal Reasoning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[116] Wenwu Zhu,et al. Social Recommendation with Optimal Limited Attention , 2019, KDD.
[117] Trevor Darrell,et al. Explainable Neural Computation via Stack Neural Module Networks , 2018, ECCV.
[118] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[119] Erik Cambria,et al. Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).
[120] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[121] Xiaodan Liang,et al. Layout-Graph Reasoning for Fashion Landmark Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[122] Xin Wang,et al. Social Recommendation with Strong and Weak Ties , 2016, CIKM.
[123] Ali Farhadi,et al. Video Relationship Reasoning Using Gated Spatio-Temporal Energy Graph , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[124] Trevor Darrell,et al. Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[125] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[126] Xin Wang,et al. Disparity-preserved Deep Cross-platform Association for Cross-platform Video Recommendation , 2019, IJCAI.
[127] Shuicheng Yan,et al. Graph-Based Global Reasoning Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[128] Roger Levy,et al. A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.
[129] Nicu Sebe,et al. A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[130] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[131] Jiwen Lu,et al. Structural Relational Reasoning of Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).