Fast Generating A Large Number of Gumbel-Max Variables

The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a nonnegative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element i (or a Gumbel-Max variable i) in proportion to its positive weight vi, the Gumbel-Max Trick first computes a Gumbel random variable gi for each positive weight element i, and then samples the element i with the largest value of gi + ln vi. Recently, applications including similarity estimation and graph embedding require to generate k independent Gumbel-Max variables from high dimensional vectors. However, it is computationally expensive for a large k (e.g., hundreds or even thousands) when using the traditional Gumbel-Max Trick. To solve this problem, we propose a novel algorithm, FastGM, that reduces the time complexity from O(kn+) to O(kln k + n+), where n+ is the number of positive elements in the vector of interest. Instead of computing k independent Gumbel random variables directly, we find that there exists a technique to generate these variables in descending order. Using this technique, our method FastGM computes variables gi + ln vi for all positive elements i in descending order. As a result, FastGM significantly reduces the computation time because we can stop the procedure of Gumbel random variables computing for many elements especially for those with small weights. Experiments on a variety of real-world datasets show that FastGM is orders of magnitude faster than state-of-the-art methods without sacrificing accuracy and incurring additional expenses.

[1]  Anshumali Shrivastava,et al.  Optimal Densification for Fast and Accurate Minwise Hashing , 2017, ICML.

[2]  Chengqi Zhang,et al.  Consistent Weighted Sampling Made More Practical , 2017, WWW.

[3]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[4]  Bin Li,et al.  D22HistoSketch: Discriminative and Dynamic Similarity-Preserving Sketching of Streaming Histograms , 2019, IEEE Trans. Knowl. Data Eng..

[5]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[6]  Ryan A. Rossi,et al.  Continuous-Time Dynamic Network Embeddings , 2018, WWW.

[7]  Ping Li,et al.  Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search , 2014, ICML.

[8]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[9]  Charu C. Aggarwal,et al.  NetWalk: A Flexible Deep Embedding Approach for Anomaly Detection in Dynamic Networks , 2018, KDD.

[10]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[11]  Ely Porat,et al.  Sketching Techniques for Collaborative Filtering , 2009, IJCAI.

[12]  Matt J. Kusner,et al.  GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution , 2016, ArXiv.

[13]  Mathias Bæk Tejs Knudsen,et al.  Fast Similarity Sketching , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[14]  Calton Pu,et al.  Evolutionary study of web spam: Webb Spam Corpus 2011 versus Webb Spam Corpus 2006 , 2012, 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[15]  Ping Li,et al.  0-Bit Consistent Weighted Sampling , 2015, KDD.

[16]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[17]  David Sontag,et al.  Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models , 2019, ICML.

[18]  Ryan Moulton,et al.  Maximally Consistent Sampling and the Jaccard Index of Probability Distributions , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[19]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[20]  Monika Henzinger,et al.  Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  Ping Li,et al.  Improved Densification of One Permutation Hashing , 2014, UAI.

[23]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[24]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[25]  Stefano Ermon,et al.  Exact Sampling with Integer Linear Programs and Random Perturbations , 2016, AAAI.

[26]  J. Wishart Statistical tables , 2018, Global Education Monitoring Report.

[27]  Harald Niederreiter,et al.  Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[28]  Philip S. Yu,et al.  Improved Consistent Weighted Sampling Revisited , 2017, IEEE Transactions on Knowledge and Data Engineering.

[29]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[30]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[31]  Guojie Song,et al.  Dynamic Network Embedding : An Extended Approach for Skip-gram based Network Embedding , 2018, IJCAI.

[32]  Huan Liu,et al.  Attributed Network Embedding for Learning in a Dynamic Environment , 2017, CIKM.

[33]  Chengqi Zhang,et al.  Canonical Consistent Weighted Sampling for Real-Value Weighted Min-Hash , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[34]  Albert Bifet,et al.  Discriminative Streaming Network Embedding , 2020, Knowl. Based Syst..

[35]  Zhiyuan Liu,et al.  Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.

[36]  Otmar Ertl,et al.  BagMinHash - Minwise Hashing Algorithm for Weighted Sets , 2018, KDD.

[37]  Tommi S. Jaakkola,et al.  Direct Optimization through arg max for Discrete Variational Auto-Encoder , 2018, NeurIPS.

[38]  Sergey Ioffe,et al.  Improved Consistent Sampling, Weighted Minhash and L1 Sketching , 2010, 2010 IEEE International Conference on Data Mining.

[39]  Siu Cheung Hui,et al.  Multi-Pointer Co-Attention Networks for Recommendation , 2018, KDD.

[40]  LiFan,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004 .

[41]  John C. S. Lui,et al.  Improving network embedding with partially available vertex and edge content , 2020, Inf. Sci..

[42]  Piotr Indyk,et al.  Scalable Techniques for Clustering the Web , 2000, WebDB.

[43]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[44]  L. Tippett Statistical Tables: For Biological, Agricultural and Medical Research , 1954 .

[45]  Bin Li,et al.  HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[46]  Rasmus Pagh,et al.  Efficient estimation for high similarities using odd sketches , 2014, WWW.

[47]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[48]  Vachik S. Dave,et al.  Neural-Brane: Neural Bayesian Personalized Ranking for Attributed Network Embedding , 2018, Data Science and Engineering.

[49]  Bernhard Haeupler,et al.  Consistent Weighted Sampling Made Fast, Small, and Easy , 2014, ArXiv.

[50]  Ping Li,et al.  b-Bit minwise hashing , 2009, WWW '10.

[51]  Kunal Talwar,et al.  Consistent Weighted Sampling , 2007 .

[52]  A. Stephen McGough,et al.  Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study , 2018, Data Science and Engineering.

[53]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[54]  Bo Zhang,et al.  Discriminative Deep Random Walk for Network Classification , 2016, ACL.

[55]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[56]  Paolo Rosso,et al.  NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching , 2019, KDD.

[57]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[58]  Edith Cohen,et al.  Self-similar Epochs: Value in arrangement , 2018, ICML.

[59]  Bin Li,et al.  POISketch: Semantic Place Labeling over User Activity Streams , 2016, IJCAI.

[60]  Yuanming Zhang,et al.  A Memory-Efficient Sketch Method for Estimating High Similarities in Streaming Sets , 2019, KDD.

[61]  Steffen Staab,et al.  Predicting User Roles in Social Networks Using Transfer Learning with Feature Transformation , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[62]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[63]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[64]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[65]  K. Selçuk Candan,et al.  How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media? , 2010, ICWSM.

[66]  Ping Li,et al.  One Permutation Hashing , 2012, NIPS.

[67]  Sreenivas Gollapudi,et al.  Exploiting asymmetry in hierarchical topic extraction , 2006, CIKM '06.