Efficient Sketching Algorithm for Sparse Binary Data
暂无分享,去创建一个
[1] Andrei Broder,et al. Network Applications of Bloom Filters: A Survey , 2004, Internet Math..
[2] Maosong Sun,et al. Semi-Supervised SimHash for Efficient Document Similarity Search , 2011, ACL.
[3] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[4] Silvio Lattanzi,et al. On compressing social networks , 2009, KDD.
[5] Inderjit S. Dhillon,et al. Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.
[6] John Langford,et al. A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..
[7] Joshua Zhexue Huang,et al. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.
[8] David P. Williamson,et al. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.
[9] Patrick Haffner,et al. Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.
[10] Ping Li,et al. In Defense of Minhash over Simhash , 2014, AISTATS.
[11] Raghav Kulkarni,et al. Efficient Dimensionality Reduction for Sparse Binary Data , 2018, 2018 IEEE International Conference on Big Data (Big Data).
[12] Yogish Sabharwal,et al. Analysis of sampling techniques for association rule mining , 2009, ICDT '09.
[13] Chong-Wah Ngo,et al. Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.
[14] Alan M. Frieze,et al. Min-wise independent permutations (extended abstract) , 1998, STOC '98.
[15] Shih-Fu Chang,et al. Circulant Binary Embedding , 2014, ICML.
[16] Spiridon Bakiras,et al. Secure Similar Document Detection with Simhash , 2013, Secure Data Management.
[17] Derek Greene,et al. Practical solutions to the problem of diagonal dominance in kernel document clustering , 2006, ICML.
[18] Roberto J. Bayardo,et al. Scaling up all pairs similarity search , 2007, WWW '07.
[19] Raghav Kulkarni,et al. Efficient Compression Technique for Sparse Sets , 2018, PAKDD.
[20] Ping Li,et al. Asymmetric Minwise Hashing for Indexing Binary Inner Products and Set Containment , 2015, WWW.
[21] Andrei Z. Broder,et al. Identifying and Filtering Near-Duplicate Documents , 2000, CPM.
[22] Rameshwar Pratap,et al. A Faster Sampling Algorithm for Spherical $k$-means , 2018, ACML.
[23] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[24] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[25] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).
[26] Anshumali Shrivastava,et al. Optimal Densification for Fast and Accurate Minwise Hashing , 2017, ICML.
[27] Matthias Hein,et al. Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.
[28] Rasmus Pagh,et al. Efficient estimation for high similarities using odd sketches , 2014, WWW.
[29] Dmitri Loguinov,et al. Probabilistic near-duplicate detection using simhash , 2011, CIKM '11.