论文信息 - Contextual RNN-GANs for Abstract Reasoning Diagram Generation

Contextual RNN-GANs for Abstract Reasoning Diagram Generation

Understanding, predicting, and generating object motions and transformations is a core problem in artificial intelligence. Modeling sequences of evolving images may provide better representations and models of motion and may ultimately be used for forecasting, simulation, or video generation. Diagrammatic Abstract Reasoning is an avenue in which diagrams evolve in complex patterns and one needs to infer the underlying pattern sequence and generate the next image in the sequence. For this, we develop a novel Contextual Generative Adversarial Network based on Recurrent Neural Networks (Context-RNN-GANs), where both the generator and the discriminator modules are based on contextual history (modeled as RNNs) and the adversarial discriminator guides the generator to produce realistic images for the particular time step in the image sequence. We evaluate the Context-RNN-GAN model (and its variants) on a novel dataset of Diagrammatic Abstract Reasoning, where it performs competitively with 10th-grade human performance but there is still scope for interesting improvements as compared to college-grade human performance. We also evaluate our model on a standard video next-frame prediction task, achieving improved performance over comparable state-of-the-art.

[1] G. Whipple,et al. The Psychological Methods of Testing Intelligence , 1915, The Psychological Clinic.

[2] R. Berdie,et al. The Differential Aptitude Tests as predictors in engineering training. , 1951 .

[3] T. G. Evans. A program for the solution of a class of geometric-analogy intelligence-test questions , 1964 .

[4] G. S. Hanna. Differential Aptitude Tests. , 1974 .

[5] Brian Falkenhainer,et al. The Structure-Mapping Engine: Algorithm and Examples , 1989, Artif. Intell..

[6] Gordon S. Novak,et al. Uses of Diagrams in Solving Physics Problems , 1992 .

[7] Selmer Bringsjord,et al. "Pulling it All Together" via Psychometric AI , 2004, AAAI Technical Report.

[8] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10] Nebojsa Jojic,et al. Recursive estimation of generative models of video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11] Claire Cardie,et al. The Power of Negative Thinking: Exploiting Label Disagreement in the Min-cut Classification Framework , 2008, COLING.

[12] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[13] Gilles Richard,et al. Analogy-Making for Solving IQ Tests: A Logical View , 2011, ICCBR.

[14] Dan Klein,et al. Mention Detection: Heuristics for the OntoNotes annotations , 2011, CoNLL Shared Task.

[15] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[18] Marc'Aurelio Ranzato,et al. Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.

[19] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[20] R. Fergus,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[21] Rob Fergus,et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[22] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.