VisualTextRank: Unsupervised Graph-based Content Extraction for Automating Ad Text to Image Search

Numerous online stock image libraries offer high quality yet copyright free images for use in marketing campaigns. To assist advertisers in navigating such third party libraries, we study the problem of automatically fetching relevant ad images given the ad text (via a short textual query for images). Motivated by our observations in logged data on ad image search queries (given ad text), we formulate a keyword extraction problem, where a keyword extracted from the ad text (or its augmented version) serves as the ad image query. In this context, we propose VisualTextRank: an unsupervised method to (i) augment input ad text using semantically similar ads, and (ii) extract the image query from the augmented ad text. VisualTextRank builds on prior work on graph based context extraction (biased TextRank in particular) by leveraging both the text and image of similar ads for better keyword extraction, and using advertiser category specific biasing with sentence-BERT embeddings. Using data collected from the Verizon Media Native (Yahoo Gemini) ad platform's stock image search feature for onboarding advertisers, we demonstrate the superiority of VisualTextRank compared to competitive keyword extraction baselines (including an 11% accuracy lift over biased TextRank). For the case when the stock image library is restricted to English queries, we show the effectiveness of VisualTextRank on multilingual ads (translated to English) while leveraging semantically similar English ads. Online tests with a simplified version of VisualTextRank led to a 28.7% increase in the usage of stock image search, and a 41.6% increase in the advertiser onboarding rate in the Verizon Media Native ad platform.

[1]  A. Yuille,et al.  DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Rada Mihalcea,et al.  Biased TextRank: Unsupervised Graph-Based Content Extraction , 2020, COLING.

[3]  Manisha Verma,et al.  Learning to Create Better Ads: Generation and Ranking Approaches for Ad Creative Refinement , 2020, CIKM.

[4]  Cho-Jui Hsieh,et al.  What Does BERT with Vision Look At? , 2020, ACL.

[5]  Jianfeng Gao,et al.  Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.

[6]  Manisha Verma,et al.  Recommending Themes for Ad Creative Design via Visual-Linguistic Representations , 2020, WWW.

[7]  Manisha Verma,et al.  Guiding creative design in online advertising , 2019, RecSys.

[8]  Narayan Bhamidipati,et al.  Understanding Consumer Journey using Attention based Recurrent Neural Networks , 2019, KDD.

[9]  Simao Herdade,et al.  Image Captioning: Transforming Objects into Words , 2019, NeurIPS.

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  Radu Soricut,et al.  Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.

[12]  Adriana Kovashka,et al.  ADVISE: Symbolism and External Knowledge for Decoding Advertisements , 2017, ECCV.

[13]  Ravi Kant,et al.  A Large Scale Prediction Engine for App Install Clicks and Conversions , 2017, CIKM.

[14]  Mingda Zhang,et al.  Automatic Understanding of Image and Video Advertisements , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Florian Boudin,et al.  pke: an open source python-based keyphrase extraction toolkit , 2016, COLING.

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.