Establishing an evaluation metric to quantify climate change image realism

With success on controlled tasks, generative models are being increasingly applied to humanitarian applications [1,2]. In this paper, we focus on the evaluation of a conditional generative model that illustrates the consequences of climate change-induced flooding to encourage public interest and awareness on the issue. Because metrics for comparing the realism of different modes in a conditional generative model do not exist, we propose several automated and human-based methods for evaluation. To do this, we adapt several existing metrics, and assess the automated metrics against gold standard human evaluation. We find that using Frechet Inception Distance (FID) with embeddings from an intermediary Inception-V3 layer that precedes the auxiliary classifier produces results most correlated with human realism. While insufficient alone to establish a human-correlated automatic evaluation metric, we believe this work begins to bridge the gap between human and automated generative evaluation procedures.

[1]  S. O'Neill,et al.  An iconic approach for representing climate change , 2009 .

[2]  John Le,et al.  Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .

[3]  Christian Früh,et al.  Google Street View: Capturing the World at Street Level , 2010, Computer.

[4]  E. Weber,et al.  Public Understanding of Climate Change in the United States Scientific Understanding of Climate Change These Assess- Ments Support the following Conclusions with High Or , 2011 .

[5]  N. Pidgeon Climate Change Risk Perception and Communication: Addressing a Critical Moment? , 2012, Risk analysis : an official publication of the Society for Risk Analysis.

[6]  S. O'Neill,et al.  On the use of imagery for climate change engagement , 2013 .

[7]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[8]  J. K. Rød,et al.  Climate change, natural hazards, and risk perception: the role of proximity and personal experience , 2015 .

[9]  Eric Gilbert,et al.  Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk , 2015, CHI.

[10]  Honglak Lee,et al.  Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[11]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[12]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[14]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Su Ruan,et al.  Medical Image Synthesis with Context-Aware Generative Adversarial Networks , 2016, MICCAI.

[16]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[17]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[18]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[19]  M. Fitzpatrick,et al.  Contemporary climatic analogs for 540 North American urban areas in the late 21st century , 2019, Nature Communications.

[20]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[21]  Michael S. Bernstein,et al.  HYPE: Human eYe Perceptual Evaluation of Generative Models , 2019, DGS@ICLR.

[22]  Graham W. Taylor,et al.  On the Evaluation of Conditional GANs , 2019, ArXiv.

[23]  Luisa Verdoliva,et al.  Do GANs Leave Artificial Fingerprints? , 2018, 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[24]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Michael S. Bernstein,et al.  HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models , 2019, NeurIPS.

[26]  Bolei Zhou,et al.  GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.