暂无分享,去创建一个
Ali Farhadi | Jiasen Lu | Jack Hessel | Mohammadreza Salehi | Yanpeng Zhao | Youngjae Yu | Rowan Zellers | Yejin Choi | Ximing Lu | Aditya Kusupati
[1] Danah Boyd,et al. Networked privacy: How teenagers negotiate context in social media , 2014, New Media Soc..
[2] Morgan Klaus Scheuerman,et al. Gender Recognition or Gender Reductionism?: The Social Implications of Embedded Gender Recognition Systems , 2018, CHI.
[3] Cordelia Schmid,et al. Just Ask: Learning to Answer Questions from Millions of Narrated Videos , 2020, ArXiv.
[4] John R Clark,et al. When good isn't good enough. , 2014, Air medical journal.
[5] Preethi Jyothi,et al. Cross-Modal learning for Audio-Visual Video Parsing , 2021, Interspeech 2021.
[6] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[7] Mariana L. Neves,et al. Neural Domain Adaptation for Biomedical Question Answering , 2017, CoNLL.
[8] Donna Harawy. Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective , 2022, Philosophical Literary Journal Logos.
[9] Quoc V. Le,et al. AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.
[10] Tarleton Gillespie,et al. Content moderation, AI, and the question of scale , 2020, Big Data Soc..
[11] Dmytro Okhonko,et al. VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding , 2021, EMNLP.
[12] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[13] Giovanni Maria Farinella,et al. Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[14] Daniel Brissaud,et al. Drawing a chip environmental profile: environmental indicators for the semiconductor industry , 2015 .
[15] Omer Levy,et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.
[16] R S Chapman,et al. Children's language learning: an interactionist perspective. , 2000, Journal of child psychology and psychiatry, and allied disciplines.
[17] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[18] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[19] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[20] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Radu Soricut,et al. A Case Study on Combining ASR and Visual Features for Generating Instructional Video Captions , 2019, CoNLL.
[22] Jon E. Froehlich,et al. Toward User-Driven Sound Recognizer Personalization with People Who Are d/Deaf or Hard of Hearing , 2021, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..
[23] James Glass,et al. AST: Audio Spectrogram Transformer , 2021, Interspeech 2021.
[24] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.
[25] Yael Pritch,et al. Clustered Synopsis of Surveillance Video , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.
[26] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[27] Matthew Crain,et al. The limits of transparency: Data brokers and commodification , 2018, New Media Soc..
[28] Christian Fuchs,et al. An Alternative View of Privacy on Facebook , 2011, Inf..
[29] Colin Raffel,et al. Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.
[30] Aäron van den Oord,et al. Multimodal Self-Supervised Learning of General Audio Representations , 2021, ArXiv.
[31] Licheng Yu,et al. TVQA: Localized, Compositional Video Question Answering , 2018, EMNLP.
[32] P. L. Adams. THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .
[33] G. Edelman. Neural Darwinism: Selection and reentrant signaling in higher brain function , 1993, Neuron.
[34] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[35] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[36] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[37] Vinay Uday Prabhu,et al. Multimodal datasets: misogyny, pornography, and malignant stereotypes , 2021, ArXiv.
[38] Nojun Kwak,et al. Self-supervised pre-training and contrastive representation learning for multiple-choice video QA , 2020, AAAI.
[39] Maarten Sap,et al. Documenting the English Colossal Clean Crawled Corpus , 2021, ArXiv.
[40] Hanqing Lu,et al. OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation , 2021, ArXiv.
[41] Michael Gasser,et al. The Development of Embodied Cognition: Six Lessons from Babies , 2005, Artificial Life.
[42] Adams Wei Yu,et al. SimVLM: Simple Visual Language Model Pretraining with Weak Supervision , 2021, ArXiv.
[43] Virgílio A. F. Almeida,et al. Auditing radicalization pathways on YouTube , 2019, FAT*.
[44] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[45] Christopher D. Manning,et al. Contrastive Learning of Medical Visual Representations from Paired Images and Text , 2020, MLHC.
[46] David Reitter,et al. Fusion of Detected Objects in Text for Visual Question Answering , 2019, EMNLP.
[47] Laura A. Dabbish,et al. "My Data Just Goes Everywhere: " User Mental Models of the Internet and Implications for Privacy and Security , 2015, SOUPS.
[48] Yejin Choi,et al. Neural Motifs: Scene Graph Parsing with Global Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[49] Janice Singer,et al. Exploring the Gender Divide on YouTube: An Analysis of the Creation and Reception of Vlogs , 2008 .
[50] Shih-Fu Chang,et al. VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Benjamin Van Durme,et al. Reporting bias and knowledge acquisition , 2013, AKBC '13.
[52] Ali Farhadi,et al. MERLOT: Multimodal Neural Script Knowledge Models , 2021, NeurIPS.
[53] Zoë Hitzig,et al. Truth from the machine: artificial intelligence and the materialization of identity , 2021, Interdisciplinary Science Reviews.
[54] Ali Farhadi,et al. From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Martial Hebert,et al. An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.
[56] Andrew Zisserman,et al. Self-Supervised MultiModal Versatile Networks , 2020, NeurIPS.
[57] Yejin Choi,et al. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.
[58] Bo Wu,et al. STAR: A Benchmark for Situated Reasoning in Real-World Videos , 2021 .
[59] Travis L. Dixon,et al. Overrepresentation and Underrepresentation of African Americans and Latinos as Lawbreakers on Television News , 2000 .
[60] Jian Ma,et al. Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100 , 2021, Int. J. Comput. Vis..
[61] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[62] Karan Desai,et al. VirTex: Learning Visual Representations from Textual Annotations , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Anaelia Ovalle,et al. Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies , 2021, EMNLP.
[64] Yueting Zhuang,et al. Video Question Answering via Gradually Refined Attention over Appearance and Motion , 2017, ACM Multimedia.
[65] Shengfeng Pan,et al. RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, ArXiv.
[66] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[67] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[68] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[69] Chen Liang,et al. Carbon Emissions and Large Neural Network Training , 2021, ArXiv.
[70] Emily M. Bender,et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.
[71] Jack Hessel,et al. Does My Multimodal Model Learn Cross-modal Interactions? It’s Harder to Tell than You Might Think! , 2020, EMNLP.
[72] Rachael Tatman,et al. Gender and Dialect Bias in YouTube’s Automatic Captions , 2017, EthNLP@EACL.
[73] Gabriel Ilharco,et al. Large-Scale Representation Learning from Visually Grounded Untranscribed Speech , 2019, CoNLL.
[74] Jaemin Cho,et al. Unifying Vision-and-Language Tasks via Text Generation , 2021, ICML.
[75] Tegan Maharaj,et al. A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[76] Joachim Bingel,et al. Disembodied Machine Learning: On the Illusion of Objectivity in NLP , 2021, ArXiv.
[77] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.
[78] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[79] Zhe Gan,et al. Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[80] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[81] Andreas Dengel,et al. AudioCLIP: Extending CLIP to Image, Text and Audio , 2021, ArXiv.
[82] Justin Salamon,et al. A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.
[83] Travis L. Dixon. Crime News and Racialized Beliefs: Understanding the Relationship Between Local News Viewing and Perceptions of African Americans and Crime , 2008 .
[84] Shih-Fu Chang,et al. VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text , 2021, NeurIPS.
[85] Roy Schwartz,et al. Data Efficient Masked Language Modeling for Vision and Language , 2021, EMNLP.
[86] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[87] Philipp Koehn,et al. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .
[88] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[89] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.
[90] Omkar M. Parkhi,et al. VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).
[91] Yejin Choi,et al. VinVL: Revisiting Visual Representations in Vision-Language Models , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[92] R Devon Hjelm,et al. Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.
[93] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[94] Kyunghyun Cho,et al. Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models , 2020, ICLR.
[95] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[96] Takeo Kanade,et al. Computer Vision and Image Understanding Computer Vision for Assistive Technologies , 2022 .
[97] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[98] Yu Cheng,et al. Large-Scale Adversarial Training for Vision-and-Language Representation Learning , 2020, NeurIPS.
[99] Eduard H. Hovy,et al. Five sources of bias in natural language processing , 2021, Lang. Linguistics Compass.
[100] Rohit Girdhar,et al. Anticipative Video Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[101] Felix Gutierrez,et al. White News: Why Local News Programs Don't Cover People of Color , 2000 .
[102] Jieyu Zhao,et al. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.
[103] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[104] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.
[105] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[106] Hao Tian,et al. ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph , 2020, AAAI.
[107] Ali Farhadi,et al. Watching the World Go By: Representation Learning from Unlabeled Videos , 2020, ArXiv.
[108] Emily Ahn,et al. Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts , 2019, EMNLP.
[109] Jitendra Malik,et al. From Lifestyle Vlogs to Everyday Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[110] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[111] James Glass,et al. AVLnet: Learning Audio-Visual Language Representations from Instructional Videos , 2021, Interspeech 2021.
[112] Lucas Beyer,et al. Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.
[113] Huaping Liu,et al. Understanding the Behaviour of Contrastive Loss , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[114] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[115] Emily M. Bender,et al. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data , 2020, ACL.
[116] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[117] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.