论文信息 - Fast Generating A Large Number of Gumbel-Max Variables

Fast Generating A Large Number of Gumbel-Max Variables

The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a nonnegative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element i (or a Gumbel-Max variable i) in proportion to its positive weight vi, the Gumbel-Max Trick first computes a Gumbel random variable gi for each positive weight element i, and then samples the element i with the largest value of gi + ln vi. Recently, applications including similarity estimation and graph embedding require to generate k independent Gumbel-Max variables from high dimensional vectors. However, it is computationally expensive for a large k (e.g., hundreds or even thousands) when using the traditional Gumbel-Max Trick. To solve this problem, we propose a novel algorithm, FastGM, that reduces the time complexity from O(kn+) to O(kln k + n+), where n+ is the number of positive elements in the vector of interest. Instead of computing k independent Gumbel random variables directly, we find that there exists a technique to generate these variables in descending order. Using this technique, our method FastGM computes variables gi + ln vi for all positive elements i in descending order. As a result, FastGM significantly reduces the computation time because we can stop the procedure of Gumbel random variables computing for many elements especially for those with small weights. Experiments on a variety of real-world datasets show that FastGM is orders of magnitude faster than state-of-the-art methods without sacrificing accuracy and incurring additional expenses.

[1] Anshumali Shrivastava,et al. Optimal Densification for Fast and Accurate Minwise Hashing , 2017, ICML.

[2] Chengqi Zhang,et al. Consistent Weighted Sampling Made More Practical , 2017, WWW.

[3] Christos Faloutsos,et al. Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[4] Bin Li,et al. D22HistoSketch: Discriminative and Dynamic Similarity-Preserving Sketching of Streaming Histograms , 2019, IEEE Trans. Knowl. Data Eng..

[5] Ruslan Salakhutdinov,et al. Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[6] Ryan A. Rossi,et al. Continuous-Time Dynamic Network Embeddings , 2018, WWW.

[7] Ping Li,et al. Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search , 2014, ICML.

[8] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[9] Charu C. Aggarwal,et al. NetWalk: A Flexible Deep Embedding Approach for Anomaly Detection in Dynamic Networks , 2018, KDD.

[10] Jian Li,et al. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[11] Ely Porat,et al. Sketching Techniques for Collaborative Filtering , 2009, IJCAI.

[12] Matt J. Kusner,et al. GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution , 2016, ArXiv.

[13] Mathias Bæk Tejs Knudsen,et al. Fast Similarity Sketching , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[14] Calton Pu,et al. Evolutionary study of web spam: Webb Spam Corpus 2011 versus Webb Spam Corpus 2006 , 2012, 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[15] Ping Li,et al. 0-Bit Consistent Weighted Sampling , 2015, KDD.

[16] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[17] David Sontag,et al. Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models , 2019, ICML.

[18] Ryan Moulton,et al. Maximally Consistent Sampling and the Jaccard Index of Probability Distributions , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[19] Steven Skiena,et al. DeepWalk: online learning of social representations , 2014, KDD.

[20] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[21] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22] Ping Li,et al. Improved Densification of One Permutation Hashing , 2014, UAI.

[23] Huan Liu,et al. Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[24] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[25] Stefano Ermon,et al. Exact Sampling with Integer Linear Programs and Random Perturbations , 2016, AAAI.

[26] J. Wishart. Statistical tables , 2018, Global Education Monitoring Report.

[27] Harald Niederreiter,et al. Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[28] Philip S. Yu,et al. Improved Consistent Weighted Sampling Revisited , 2017, IEEE Transactions on Knowledge and Data Engineering.

[29] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.

[30] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[31] Guojie Song,et al. Dynamic Network Embedding : An Extended Approach for Skip-gram based Network Embedding , 2018, IJCAI.

[32] Huan Liu,et al. Attributed Network Embedding for Learning in a Dynamic Environment , 2017, CIKM.

[33] Chengqi Zhang,et al. Canonical Consistent Weighted Sampling for Real-Value Weighted Min-Hash , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[34] Albert Bifet,et al. Discriminative Streaming Network Embedding , 2020, Knowl. Based Syst..

[35] Zhiyuan Liu,et al. Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.

[36] Otmar Ertl,et al. BagMinHash - Minwise Hashing Algorithm for Weighted Sets , 2018, KDD.

[37] Tommi S. Jaakkola,et al. Direct Optimization through arg max for Discrete Variational Auto-Encoder , 2018, NeurIPS.

[38] Sergey Ioffe,et al. Improved Consistent Sampling, Weighted Minhash and L1 Sketching , 2010, 2010 IEEE International Conference on Data Mining.

[39] Siu Cheung Hui,et al. Multi-Pointer Co-Attention Networks for Recommendation , 2018, KDD.

[40] LiFan,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004 .

[41] John C. S. Lui,et al. Improving network embedding with partially available vertex and edge content , 2020, Inf. Sci..

[42] Piotr Indyk,et al. Scalable Techniques for Clustering the Web , 2000, WebDB.

[43] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[44] L. Tippett. Statistical Tables: For Biological, Agricultural and Medical Research , 1954 .

[45] Bin Li,et al. HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[46] Rasmus Pagh,et al. Efficient estimation for high similarities using odd sketches , 2014, WWW.

[47] Eli Upfal,et al. Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[48] Vachik S. Dave,et al. Neural-Brane: Neural Bayesian Personalized Ranking for Attributed Network Embedding , 2018, Data Science and Engineering.

[49] Bernhard Haeupler,et al. Consistent Weighted Sampling Made Fast, Small, and Easy , 2014, ArXiv.

[50] Ping Li,et al. b-Bit minwise hashing , 2009, WWW '10.

[51] Kunal Talwar,et al. Consistent Weighted Sampling , 2007 .

[52] A. Stephen McGough,et al. Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study , 2018, Data Science and Engineering.

[53] R. Duncan Luce,et al. Individual Choice Behavior: A Theoretical Analysis , 1979 .

[54] Bo Zhang,et al. Discriminative Deep Random Walk for Network Classification , 2016, ACL.

[55] Mingzhe Wang,et al. LINE: Large-scale Information Network Embedding , 2015, WWW.

[56] Paolo Rosso,et al. NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching , 2019, KDD.

[57] Jure Leskovec,et al. node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[58] Edith Cohen,et al. Self-similar Epochs: Value in arrangement , 2018, ICML.

[59] Bin Li,et al. POISketch: Semantic Place Labeling over User Activity Streams , 2016, IJCAI.

[60] Yuanming Zhang,et al. A Memory-Efficient Sketch Method for Estimating High Similarities in Streaming Sets , 2019, KDD.

[61] Steffen Staab,et al. Predicting User Roles in Social Networks Using Transfer Learning with Feature Transformation , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[62] Thomas H. Cormen,et al. Introduction to algorithms [2nd ed.] , 2001 .

[63] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[64] Qiongkai Xu,et al. GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[65] K. Selçuk Candan,et al. How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media? , 2010, ICWSM.

[66] Ping Li,et al. One Permutation Hashing , 2012, NIPS.

[67] Sreenivas Gollapudi,et al. Exploiting asymmetry in hierarchical topic extraction , 2006, CIKM '06.