DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models

With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model reacts to different prompts or what the best prompts are. To help researchers tackle these critical challenges, we introduce DiffusionDB, the first large-scale text-to-image prompt dataset totaling 6.5TB, containing 14 million images generated by Stable Diffusion, 1.8 million unique prompts, and hyperparameters specified by real users. We analyze the syntactic and semantic characteristics of prompts. We pinpoint specific hyperparameter values and prompt styles that can lead to model errors and present evidence of potentially harmful model usage, such as the generation of misinformation. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models. DiffusionDB is publicly available at: https://poloclub.github.io/diffusiondb.

[1]  Alexander M. Rush,et al.  Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models , 2022, IEEE Transactions on Visualization and Computer Graphics.

[2]  Ludwig Schmidt,et al.  LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.

[3]  P. Chambon,et al.  Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains , 2022, ArXiv.

[4]  David J. Fleet,et al.  Imagen Video: High Definition Video Generation with Diffusion Models , 2022, ArXiv.

[5]  A. Borji Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2 , 2022, ArXiv.

[6]  Dmitry Ustalov,et al.  Best Prompts for Text-to-Image Models and How to Find Them , 2022, ArXiv.

[7]  Lydia B. Chilton,et al.  Initial Images: Using Image Prompts to Improve Subject Representation in Multimodal AI Generated Art , 2022, Creativity & Cognition.

[8]  J. Oppenlaender A Taxonomy of Prompt Modifiers for Text-To-Image Generation , 2022, 2204.13988.

[9]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[10]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jonathan Berant,et al.  Learning To Retrieve Prompts for In-Context Learning , 2021, NAACL.

[12]  Lydia B. Chilton,et al.  Design Guidelines for Prompt Engineering Text-to-Image Generative Models , 2021, CHI.

[13]  S. Riedel,et al.  Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity , 2021, ACL.

[14]  Yisroel Mirsky,et al.  The Creation and Detection of Deepfakes , 2020, ACM Comput. Surv..

[15]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[16]  Laria Reynolds,et al.  Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm , 2021, CHI Extended Abstracts.

[17]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[18]  A. Linear-probe,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021 .

[19]  Chris Russell,et al.  Explaining Explanations in AI , 2018, FAT.

[20]  Sébastien Marcel,et al.  DeepFakes: a New Threat to Face Recognition? Assessment and Detection , 2018, ArXiv.

[21]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[22]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[23]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[24]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[25]  Fei-Fei Li,et al.  Building and using a semantivisual image hierarchy , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[27]  Eero Hyvönen,et al.  Semantic Autocompletion , 2006, ASWC.

[28]  Hongan Wang,et al.  Visualization of large hierarchical data by circle packing , 2006, CHI.

[29]  Rich Salz,et al.  A Universally Unique IDentifier (UUID) URN Namespace , 2005, RFC.

[30]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[31]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[32]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[33]  H. Hotelling Relations Between Two Sets of Variates , 1936 .