Multimodal Procedural Planning via Dual Text-Image Prompting
暂无分享,去创建一个
William Yang Wang | Yujie Lu | Wanrong Zhu | Zhiyu Chen | Pan Lu | X. Wang
[1] William Yang Wang,et al. Visualize Before You Write: Imagination-Guided Open-Ended Text Generation , 2022, FINDINGS.
[2] Chan Hee Song,et al. LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models , 2022, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Alexei A. Efros,et al. InstructPix2Pix: Learning to Follow Image Editing Instructions , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Duen Horng Chau,et al. DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models , 2022, ArXiv.
[5] Dong Yu,et al. Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination , 2022, EMNLP.
[6] Hou Pong Chan,et al. Multimedia Generative Script Learning for Task Planning , 2022, arXiv.org.
[7] S. Gu,et al. Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.
[8] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[9] Derek Hoiem,et al. Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners , 2022, NeurIPS.
[10] Dani Yogatama,et al. Language Models Can See: Plugging Visual Controls in Text Generation , 2022, ArXiv.
[11] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[12] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[13] S. Levine,et al. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.
[14] Adrian S. Wong,et al. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language , 2022, ICLR.
[15] Yaniv Taigman,et al. Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors , 2022, ECCV.
[16] Dongyan Zhao,et al. Things not Written in Text: Exploring Spatial Commonsense from Visual Signals , 2022, ACL.
[17] Jey Han Lau,et al. An Interpretable Neuro-Symbolic Reasoning Framework for Task-Oriented Dialogue Generation , 2022, ACL.
[18] Song-Chun Zhu,et al. Triangular Character Animation Sampling with Motion, Emotion, and Relation , 2022, ArXiv.
[19] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[20] Jingren Zhou,et al. OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework , 2022, ICML.
[21] S. Hoi,et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.
[22] P. Abbeel,et al. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , 2022, ICML.
[23] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Prafulla Dhariwal,et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.
[25] Xizhou Zhu,et al. Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Fang Wen,et al. Vector Quantized Diffusion Model for Text-to-Image Synthesis , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] R. Weischedel,et al. Understanding Multimodal Procedural Knowledge by Sequencing Multimodal Instructional Manuals , 2021, ACL.
[28] Lydia B. Chilton,et al. Design Guidelines for Prompt Engineering Text-to-Image Generative Models , 2021, CHI.
[29] Zhe Gan,et al. An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA , 2021, AAAI.
[30] William Yang Wang,et al. Neuro-Symbolic Causal Language Planning with Commonsense Prompting , 2022, ArXiv.
[31] Qingcai Chen,et al. Semi-supervised Visual Feature Integration for Language Models through Sentence Visualization , 2021, ICMI.
[32] Rémi Calizzano,et al. Ordering sentences and paragraphs with pre-trained encoder-decoder transformers and pointer ensembles , 2021, DocEng.
[33] Rajshekhar Sunderraman,et al. Improving Text-to-Image Synthesis Using Contrastive Learning , 2021, BMVC.
[34] Oriol Vinyals,et al. Multimodal Few-Shot Learning with Frozen Language Models , 2021, NeurIPS.
[35] Song-Chun Zhu,et al. SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues , 2021, ACL.
[36] Chang Zhou,et al. CogView: Mastering Text-to-Image Generation via Transformers , 2021, NeurIPS.
[37] Ronan Le Bras,et al. CLIPScore: A Reference-free Evaluation Metric for Image Captioning , 2021, EMNLP.
[38] Song-Chun Zhu,et al. Towards Socially Intelligent Agents with Mental State Transition and Human Value , 2021, SIGDIAL.
[39] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[40] Jaemin Cho,et al. Unifying Vision-and-Language Tasks via Text Generation , 2021, ICML.
[41] Jing Yu Koh,et al. Cross-Modal Contrastive Learning for Text-to-Image Generation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] A. Linear-probe,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021 .
[43] Peter Alexander Jansen. Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions , 2020, FINDINGS.
[44] Chris Callison-Burch,et al. Reasoning about Goals, Steps, and Temporal Ordering with WikiHow , 2020, EMNLP.
[45] N. Sebe,et al. DF-GAN: Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis , 2020, ArXiv.
[46] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[47] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[48] Juan Carlos Niebles,et al. Procedure Planning in Instructional Videos , 2019, ECCV.
[49] Seungmin Seo,et al. Topic-Guided Coherence Modeling for Sentence Ordering by Preserving Global and Local Information , 2019, EMNLP.
[50] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.
[51] Wei Chen,et al. DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Nazli Ikizler-Cinbis,et al. RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes , 2018, EMNLP.
[53] Claudiu Musat,et al. Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer Learning , 2018, IJCAI.
[54] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[55] Zhongfei Zhang,et al. Deep Attentive Sentence Ordering Network , 2018, EMNLP.
[56] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[57] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[58] Xuanjing Huang,et al. Neural Sentence Ordering , 2016, ArXiv.
[59] Matt J. Kusner,et al. From Word Embeddings To Document Distances , 2015, ICML.
[60] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.
[61] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[62] Matthew R. Walter,et al. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.
[63] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[64] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.