Artificial intelligence snapchat: Visual conversation agent

Visual conversation is a dialog in which parties exchange visual information. The key novelty presented in this paper is an artificial intelligence-driven visual conversation automation method. We will present a state of the art Artificial Intelligence Snapchat Visual Conversation Agent (AISVCA). AISVCA uses our proposed artificial intelligence-driven visual conversation automation method to create received image caption and generate an appropriate reasonable visual response. These functionalities are achieved by using a combination of Convolutional Neural Network (CNN), Long Short-Term Memory Neural Network (LSTM) and, Latent Semantic Indexing method (LSI). CNN and LSTM are used to create image captions and, LSI is used to assess the semantic similarity between captions generated from personalized image dataset, and captions that are extracted from the received image content. We will show that AISVCA, using the proposed method can generate a visual response that is basically indistinguishable from a human visual response. To evaluate the proposed approach, we measured the accuracy of the proposed system and, conducted a user study to test communication quality. In the user study, we analyzed source credibility and interpersonal attraction of the AISVCA. The user study results showed that there are no significant differences in communication quality between a visual conversation with AISVCA and visual conversation with the human agent.

[1]  Patric R. Spence,et al.  Is that a bot running the social media feed? Testing the differences in perceptions of communication quality for a human agent and a bot agent on Twitter , 2014, Comput. Hum. Behav..

[2]  Vicki L. Plano Clark,et al.  Best practices in mixed methods for quality of life research , 2012, Quality of Life Research.

[3]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Christina Haas,et al.  E-credibility: Building Common Ground in Web Environments , 2003 .

[5]  Rong Jin,et al.  Understanding bag-of-words model: a statistical framework , 2010, Int. J. Mach. Learn. Cybern..

[6]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[7]  Geoffrey Zweig,et al.  From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Moses Soh Learning Cnn Lstm Architectures For Image Caption Generation , 2016 .

[9]  Hinrich Schütze,et al.  Introduction to Information Retrieval: Matrix decompositions and latent semantic indexing , 2008 .

[10]  Yoshua Bengio,et al.  ChatPainter: Improving Text to Image Generation using Dialogue , 2018, ICLR.

[11]  Roobina Ohanian The impact of celebrity spokespersons' perceived image on consumers' intention to purchase. , 1991 .

[12]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[13]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[15]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[16]  Stefan Lee,et al.  Evaluating Visual Conversational Agents via Cooperative Human-AI Games , 2017, HCOMP.

[17]  Margaret Mitchell,et al.  Generating Natural Questions About an Image , 2016, ACL.

[18]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[19]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  J. Mccroskey,et al.  The measurement of interpersonal attraction , 1974 .

[21]  Dumitru Erhan,et al.  Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  José M. F. Moura,et al.  Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  J. Mccroskey,et al.  Goodwill: A reexamination of the construct and its measurement , 1999 .