On Efficient Retrieval of Top Similarity Vectors

Retrieval of relevant vectors produced by representation learning critically influences the efficiency in natural language processing (NLP) tasks. In this paper, we demonstrate an efficient method for searching vectors via a typical non-metric matching function: inner product. Our method, which constructs an approximate Inner Product Delaunay Graph (IPDG) for top-1 Maximum Inner Product Search (MIPS), transforms retrieving the most suitable latent vectors into a graph search problem with great benefits of efficiency. Experiments on data representations learned for different machine learning tasks verify the outperforming effectiveness and efficiency of the proposed IPDG.

[1]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[2]  Jianfeng Gao,et al.  Reasoning in Vector Space: An Exploratory Study of Question Answering , 2016, ICLR.

[3]  Forest Baskett,et al.  An Algorithm for Finding Nearest Neighbors , 1975, IEEE Transactions on Computers.

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Vasudeva Varma,et al.  A weighted tag similarity measure based on a collaborative weight model , 2010, SMUC '10.

[6]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[7]  Ping Li,et al.  Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS) , 2014, UAI.

[8]  Hang Li,et al.  Deep Learning for Matching in Search and Recommendation , 2018, SIGIR.

[9]  Yulia Tsvetkov,et al.  Problems With Evaluation of Word Embeddings Using Word Similarity Tasks , 2016, RepEval@ACL.

[10]  Sabine Schulte im Walde,et al.  Analogies in Complex Verb Meaning Shifts: the Effect of Affect in Semantic Similarity Models , 2018, NAACL-HLT.

[11]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[12]  Yannis Avrithis,et al.  Locally Optimized Product Quantization for Approximate Nearest Neighbor Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[14]  Ping Li,et al.  GPU-based minwise hashing: GPU-based minwise hashing , 2012, WWW.

[15]  Kohei Sugawara,et al.  On Approximately Searching for Similar Word Embeddings , 2016, ACL.

[16]  Matthijs Douze,et al.  FastText.zip: Compressing text classification models , 2016, ArXiv.

[17]  Ping Li,et al.  SONG: Approximate Nearest Neighbor Search on GPU , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Shulong Tan,et al.  Fast Item Ranking under Neural Network based Measures , 2020, WSDM.

[20]  Jinjun Xiong,et al.  Document Similarity for Texts of Varying Lengths via Hidden Topics , 2018, ACL.

[21]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[22]  Tamara Sumner,et al.  Bayesian Supervised Domain Adaptation for Short Text Similarity , 2016, NAACL.

[23]  P Cignoni,et al.  DeWall: A fast divide and conquer Delaunay triangulation algorithm in Ed , 1998, Comput. Aided Des..

[24]  Jinfeng Li,et al.  Norm-Ranging LSH for Maximum Inner Product Search , 2018, NeurIPS.

[25]  Sanjiv Kumar,et al.  Quantization based Fast Inner Product Search , 2015, AISTATS.

[26]  Inderjit S. Dhillon,et al.  A Greedy Approach for Budgeted Maximum Inner Product Search , 2016, NIPS.

[27]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[28]  Anthony K. H. Tung,et al.  Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search , 2018, KDD.

[29]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[30]  Yury A. Malkov,et al.  Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[33]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[34]  Yifan Sun,et al.  A Simple Approach to Learn Polysemous Word Embeddings , 2017, ArXiv.

[35]  Jianfeng Gao,et al.  Learning Continuous Phrase Representations for Translation Modeling , 2014, ACL.

[36]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[37]  Alessandro Moschitti,et al.  Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction , 2013, ACL.

[38]  Artem Babenko,et al.  Non-metric Similarity Graphs for Maximum Inner Product Search , 2018, NeurIPS.

[39]  Omer Levy,et al.  A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments , 2016, EACL.

[40]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[41]  Sanjiv Kumar,et al.  Multiscale Quantization for Fast Similarity Search , 2017, NIPS.

[42]  Christopher D. Manning,et al.  Random Walks for Text Semantic Similarity , 2009, Graph-based Methods for Natural Language Processing.

[43]  Takenobu Tokunaga,et al.  Evaluating text coherence based on semantic similarity graph , 2017, TextGraphs@ACL.

[44]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[45]  Kyomin Jung,et al.  Synonym discovery with etymology-based word embeddings , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[46]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[47]  Ulrich Paquet,et al.  Speeding up the Xbox recommender system using a euclidean transformation for inner-product spaces , 2014, RecSys '14.