VisRel: Media Search at Scale

In this paper, we present VisRel, a deployed large-scale media search system that leverages text understanding, media understanding, and multimodal technologies to deliver a modern multimedia search experience. We share our insight on developing image and video understanding models for content retrieval, training efficient and effective media-to-query relevance models, and refining online and offline metrics to measure the success of one of the largest media search databases in the industry. We summarize our learnings gathered from hundreds of A/B test experiments and describe the most effective technical approaches. The techniques presented in this work have contributed 34% (abs.) improvement to media-to-query relevance and 10% improvement to user engagement. We believe that this work can provide practical solutions and insights for engineers who are interested in applying media understanding technologies to empower multimedia search systems that operate at Facebook scale.

[1]  Albert Gordo,et al.  Rosetta: Large Scale System for Text Detection and Recognition in Images , 2018, KDD.

[2]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[3]  Qiaosong Wang,et al.  Visual Search at eBay , 2017, KDD.

[4]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[5]  Xi Chen,et al.  Web-Scale Responsive Visual Search at Bing , 2018, KDD.

[6]  Yu Cheng,et al.  UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.

[7]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  Chao Yang,et al.  Unicorn: A System for Searching the Social Graph , 2013, Proc. VLDB Endow..

[10]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[11]  Jitendra Malik,et al.  SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[13]  Kelly E. Murray,et al.  Under the Hood , 1996, J. Object Oriented Program..

[14]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[15]  Allan Jabri,et al.  Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Hao-Yu Wu,et al.  Making Classification Competitive for Deep Metric Learning , 2018, ArXiv.

[18]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[19]  Yiqun Liu,et al.  GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce , 2020, KDD.

[20]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[21]  Linjun Yang,et al.  Embedding-based Retrieval in Facebook Search , 2020, KDD.

[22]  Wei Chu,et al.  An unbiased offline evaluation of contextual bandit algorithms with generalized linear models , 2011 .

[23]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[24]  Yiqun Liu,et al.  MSURU: Large Scale E-commerce Image Classification with Weakly Supervised Search Data , 2019, KDD.

[25]  Hongbo Deng,et al.  A Dual Heterogeneous Graph Attention Network to Improve Long-Tail Performance for Shop Search in E-Commerce , 2020, KDD.

[26]  S. Muthukrishnan,et al.  Offline Evaluation of Ranking Policies with Click Models , 2018, KDD.

[27]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Eric Tzeng,et al.  Learning a Unified Embedding for Visual Search at Pinterest , 2019, KDD.

[29]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[30]  Liang Zhang,et al.  DeText: A Deep Text Ranking Framework with BERT , 2020, CIKM.