Less is More: Pursuing the Visual Turing Test with the Kuleshov Effect

The Turing test centers on the idea that if a computer could trick a human into believing that it was human, then the machine was deemed to be intelligent or indistinguishable from people. Designing a visual Turing test involves recognizing objects and their relationships on images and creating a method to derive new concepts from the visual information. Until now, the proposed visual tests heavily use natural language processing to conduct the questionnaire or storytelling. We deviate from the mainstream, and we propose to reframe the visual Turing test through the Kuleshov effect to avoid written or spoken language. The idea resides on elucidating a method that creates the concept of montage synthetically. Like the first days of cinema, we would like to convey messages with the interpretation of image shots that a machine could decipher while comparing it with those scored by humans. The first implementation of this new test uses images from a psychology study where the circumplex model is applied to rate each image. We consider five deep learning methodologies and eight optimizers, and through semiotics, we derive an emotional state in the computer. The results are promising since we confirm that this version of the visual Turing test is challenging as a new research avenue.

[1]  Zdzisław Kowalczuk,et al.  Interpretation and modeling of emotions in the management of autonomous robots using a control paradigm based on a scheduling variable , 2020, Eng. Appl. Artif. Intell..

[2]  Donald Geman,et al.  Visual Turing test for computer vision systems , 2015, Proceedings of the National Academy of Sciences.

[3]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[4]  C. Redies,et al.  Global Image Properties Predict Ratings of Affective Pictures , 2020, Frontiers in Psychology.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Jeffrey Ho,et al.  High-Level Concepts for Affective Understanding of Images , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[9]  Giacomo Mauro DAriano The Journal of Personality and Social Psychology. , 2002 .

[10]  L. Leyman,et al.  The Karolinska Directed Emotional Faces: A validation study , 2008 .

[11]  Tsuhan Chen,et al.  A mixed bag of emotions: Model, predict, and transfer emotion distributions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yash Goyal,et al.  Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[14]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[15]  Murray Campbell,et al.  I-athlon: Towards A Multidimensional Turing Test , 2016, AI Mag..

[16]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[17]  Qingming Huang,et al.  Affective Image Content Analysis: A Comprehensive Survey , 2018, IJCAI.

[18]  Loong Fah Cheong,et al.  Affective understanding in film , 2006, IEEE Trans. Circuits Syst. Video Technol..

[19]  Sandra Lowe,et al.  Film Technique And Film Acting , 2016 .

[20]  Warren Buckland,et al.  The cognitive semiotics of film , 2000 .

[21]  Matthew Crosby,et al.  Building Thinking Machines by Solving Animal Cognition Tasks , 2020, Minds and Machines.

[22]  Song-Chun Zhu,et al.  Visual Persuasion: Inferring Communicative Intents of Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[24]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[25]  Benedek Kurdi,et al.  Introducing the Open Affective Standardized Image Set (OASIS) , 2017, Behavior research methods.

[26]  Bo Begole,et al.  Responsive media: media experiences in the age of thinking machines , 2017, APSIPA Transactions on Signal and Information Processing.

[27]  Sambit Bakshi,et al.  Artificial Visual Cortex and Random Search for Object Categorization , 2019, IEEE Access.

[28]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[29]  J. Russell A circumplex model of affect. , 1980 .

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  J. Mullennix,et al.  An examination of the Kuleshov effect using still photographs , 2019, PloS one.

[33]  Arthur A. Chubarov,et al.  Modeling Behavior of Virtual Actors: A Limited Turing Test for Social-Emotional Intelligence , 2017, BICA 2017.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Daniel Chandler,et al.  Semiotics: The Basics , 2001 .

[36]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.