Cross-media Structured Common Space for Multimedia Event Extraction
暂无分享,去创建一个
Heng Ji | Shih-Fu Chang | Di Lu | Manling Li | Qi Zeng | Alireza Zareian | Spencer Whitehead | Shih-Fu Chang | Alireza Zareian | Manling Li | Heng Ji | Qi Zeng | Di Lu | Spencer Whitehead
[1] Heng Ji,et al. Refining Event Extraction through Cross-Document Inference , 2008, ACL.
[2] Philipp Koehn,et al. Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.
[3] Yi Yang,et al. Bi-Level Semantic Representation Analysis for Multimedia Event Detection , 2017, IEEE Transactions on Cybernetics.
[4] Yongdong Zhang,et al. Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching , 2019, ACM Multimedia.
[5] Chenliang Xu,et al. TRECVID 2012 GENIE: Multimedia Event Detection and Recounting , 2012, TRECVID.
[6] Weiwei Sun,et al. UniVSE: Robust Visual Semantic Embeddings via Structured Semantic Representations , 2019, ArXiv.
[7] Tao Mei,et al. Recurrent Tubelet Proposal and Recognition Networks for Action Detection , 2018, ECCV.
[8] Ali Farhadi,et al. Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[10] Heng Ji,et al. Joint Entity and Event Extraction with Generative Adversarial Imitation Learning , 2019, Data Intelligence.
[11] Heng Ji,et al. Reliability-aware Dynamic Feature Composition for Name Tagging , 2019, ACL.
[12] Hannaneh Hajishirzi,et al. Entity, Relation, and Event Extraction with Contextualized Span Representations , 2019, EMNLP.
[13] Jun Zhao,et al. Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks , 2015, ACL.
[14] Alejandro Héctor Toselli,et al. Viterbi Based Alignment between Text Images and their Transcripts , 2007, LaTeCH@ACL 2007.
[15] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Chuan Wang,et al. A Transition-based Algorithm for AMR Parsing , 2015, NAACL.
[17] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[19] Ioannis A. Kakadiaris,et al. Adversarial Representation Learning for Text-to-Image Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[21] Chris Callison-Burch,et al. Learning Translations via Images with a Massively Multilingual Image Dataset , 2018, ACL.
[22] Dongsheng Li,et al. Exploring Pre-trained Language Models for Event Extraction and Generation , 2019, ACL.
[23] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[24] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[25] Cordelia Schmid,et al. Learning Video Representations using Contrastive Bidirectional Transformer , 2019 .
[26] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[27] Ralph Grishman,et al. Acquiring Topic Features to improve Event Extraction: in Pre-selected and Balanced Collections , 2011, RANLP.
[28] Kaiming He,et al. Long-Term Feature Banks for Detailed Video Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Heng Ji,et al. Improving Event Extraction via Multimodal Integration , 2017, ACM Multimedia.
[30] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[31] Rui Wang,et al. Open Event Extraction from Online Text using a Generative Adversarial Network , 2019, EMNLP.
[32] Cordelia Schmid,et al. Contrastive Bidirectional Transformer for Temporal Representation Learning , 2019, ArXiv.
[33] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[35] Louis-Philippe Morency,et al. Integrating Multimodal Information in Large Pretrained Transformers , 2020, ACL.
[36] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[37] Rebecka Weegar,et al. Linking Entities Across Images and Text , 2015, CoNLL.
[38] Jun Zhao,et al. Collective Event Detection via a Hierarchical and Bias Tagging Networks with Gated Multi-level Attention Mechanisms , 2018, EMNLP.
[39] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[40] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[41] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[42] Yuning Jiang,et al. Learning Visually-Grounded Semantics from Contrastive Adversarial Samples , 2018, COLING.
[43] Carina Silberer,et al. Grounding Semantic Roles in Images , 2018, EMNLP.
[44] Chao Zhang,et al. Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[45] Dong Liu,et al. EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video , 2015, ACM Multimedia.
[46] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[47] Chuan Wang,et al. Boosting Transition-based AMR Parsing with Refined Actions and Auxiliary Analyzers , 2015, ACL.
[48] Yin Li,et al. Compositional Learning for Human Object Interaction , 2018, ECCV.
[49] Heng Ji,et al. Joint Event Extraction via Structured Prediction with Global Features , 2013, ACL.
[50] Changsheng Xu,et al. Semantic Event Extraction from Basketball Games using Multi-Modal Analysis , 2007, 2007 IEEE International Conference on Multimedia and Expo.
[51] Xiao Liu,et al. Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation , 2018, EMNLP.
[52] Louis-Philippe Morency,et al. M-BERT: Injecting Multimodal Information in the BERT Structure , 2019, ArXiv.
[53] Jordi Pont-Tuset,et al. The Open Images Dataset V4 , 2018, International Journal of Computer Vision.
[54] Mubarak Shah,et al. VideoCapsuleNet: A Simplified Network for Action Detection , 2018, NeurIPS.
[55] Svetlana Lazebnik,et al. Recurrent Models for Situation Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[56] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[57] Guodong Zhou,et al. Self-regulation: Employing a Generative Adversarial Network to Improve Event Detection , 2018, ACL.
[58] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[59] Heng Ji,et al. CAMR at SemEval-2016 Task 8: An Extended Transition-based AMR Parser , 2016, SemEval@NAACL-HLT.
[60] Mitchell Stephens,et al. The rise of the image, the fall of the word , 1998 .
[61] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[62] James Allan,et al. Multimedia Event Detection and Recounting , 2013 .
[63] Ellen Riloff,et al. Bootstrapped Training of Event Extraction Classifiers , 2012, EACL.
[64] Nicu Sebe,et al. Joint Attributes and Event Analysis for Multimedia Event Detection , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[65] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[66] Ralph Grishman,et al. Joint Event Extraction via Recurrent Neural Networks , 2016, NAACL.
[67] Licheng Yu,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ArXiv.
[68] Christopher R. Johnson,et al. Background to Framenet , 2003 .