论文信息 - New Algorithmic Tools for Distributed Similarity Search and Edge Estimation

New Algorithmic Tools for Distributed Similarity Search and Edge Estimation

[20] Rina Panigrahy,et al. Lower Bounds on Near Neighbor Search via Metric Expansion , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[21] Cyrus Rashtchian,et al. Edge Estimation with Independent Set Oracles , 2017, ITCS.

[22] Pradeep Dubey,et al. Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing , 2013, Proc. VLDB Endow..

[23] Sanjoy Dasgupta,et al. Incremental Clustering: The Case for Extra Clusters , 2014, NIPS.

[24] Rajeev Motwani,et al. Lower bounds on locality sensitive hashing , 2005, SCG '06.

[25] Guillaume J. Filion,et al. Starcode: sequence clustering based on all-pairs search , 2015, Bioinform..

[26] Marina Meila,et al. An Experimental Comparison of Model-Based Clustering Methods , 2004, Machine Learning.

[27] Alessandro Panconesi,et al. Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[28] Guoliang Li,et al. Efficient parallel partition-based algorithms for similarity search and join with edit distance constraints , 2013, EDBT '13.

[29] Rafail Ostrovsky,et al. Low distortion embeddings for edit distance , 2007, JACM.

[30] Teofilo F. GONZALEZ,et al. Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[31] Man Lung Yiu,et al. Identifying the Most Connected Vertices in Hidden Bipartite Graphs Using Group Testing , 2013, IEEE Transactions on Knowledge and Data Engineering.

[32] Huzefa Rangwala,et al. Efficient Clustering of Metagenomic Sequences using Locality Sensitive Hashing , 2012, SDM.

[33] L. H. Harper. Optimal Assignments of Numbers to Vertices , 1964 .

[34] Shai Ben-David,et al. Clustering Oligarchies , 2013, AISTATS.

[35] Mikkel Thorup. High Speed Hashing for Integers and Strings , 2015, ArXiv.

[36] Jeffrey D. Ullman,et al. Upper and Lower Bounds on the Cost of a Map-Reduce Computation , 2012, Proc. VLDB Endow..

[37] Alexandr Andoni,et al. Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing , 2015, SoCG.

[38] Atsuyoshi Nakamura,et al. On Practical Accuracy of Edit Distance Approximation Algorithms , 2017, ArXiv.

[39] Maria-Florina Balcan,et al. Robust hierarchical clustering , 2013, J. Mach. Learn. Res..

[40] Rina Panigrahy,et al. A Geometric Approach to Lower Bounds for Approximate Near-Neighbor Search and Partial Match , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[41] Alexandr Andoni,et al. Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors , 2016, SODA.

[42] Will Rosenbaum,et al. On Sampling Edges Almost Uniformly , 2017, SOSA.

[43] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[44] Maria-Florina Balcan,et al. Clustering under approximation stability , 2013, JACM.

[45] Santosh S. Vempala,et al. On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[46] Rasmus Pagh,et al. On the Complexity of Inner Product Similarity Join , 2015, PODS.

[47] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[48] J. Ott,et al. Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[49] LihChyun Shu,et al. Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis , 2013, CIKM.

[50] Rasmus Pagh. Locality-sensitive Hashing without False Negatives , 2016, SODA.

[51] Esko Ukkonen,et al. Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[52] Rasmus Pagh,et al. I/O-Efficient Similarity Join , 2017, Algorithmica.

[53] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[54] Dana Ron,et al. Approximately Counting Triangles in Sublinear Time , 2017, SIAM J. Comput..

[55] Larry J. Stockmeyer,et al. On Approximation Algorithms for #P , 1985, SIAM J. Comput..

[56] David Conlon,et al. Finite reflection groups and graph norms , 2016, 1611.05784.

[57] Rudolf Ahlswede,et al. Appendix: On Edge-Isoperimetric Theorems for Uniform Hypergraphs , 2006, GTIT-C.

[58] Luis Ceze,et al. A DNA-Based Archival Storage System , 2017 .

[59] L. H. Harper. On a problem of Kleitman and West , 1991, Discret. Math..

[60] Andrew C. Yao,et al. Lower bounds by probabilistic arguments , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[61] Andrei Z. Broder,et al. Identifying and Filtering Near-Duplicate Documents , 2000, CPM.

[62] William H. Swallow,et al. Group testing for estimating infection rates and probabilities of disease transmission , 1985 .

[63] Martin Dietzfelbinger,et al. Universal Hashing and k-Wise Independent Random Variables via Integer Arithmetic without Primes , 1996, STACS.

[64] Terence Tao,et al. A new bound on partial sum-sets and difference-sets, and applications to the Kakeya conjecture , 1999 .

[65] Larry J. Stockmeyer. The Complexity of Approximate Counting (Preliminary Version) , 1983, STOC 1983.

[66] Elchanan Mossel,et al. Sequence assembly from corrupted shotgun reads , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[67] Luis Gravano,et al. Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.

[68] Richard Durbin,et al. Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[69] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[70] Alexander Sidorenko,et al. A correlation inequality for bipartite graphs , 1993, Graphs Comb..

[71] Ashish Goel,et al. Dimension independent similarity computation , 2012, J. Mach. Learn. Res..

[72] Jeffrey Xu Yu,et al. Efficient similarity joins for near-duplicate detection , 2011, TODS.

[73] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[74] Nikhil Bansal,et al. Correlation Clustering , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[75] Noga Alon,et al. Non-averaging Subsets and Non-vanishing Transversals , 1999, J. Comb. Theory, Ser. A.

[76] B. Lindström,et al. A Generalization of a Combinatorial Theorem of Macaulay , 1969 .

[77] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[78] Alexandr Andoni,et al. The Smoothed Complexity of Edit Distance , 2008, ICALP.

[79] Renée J. Miller,et al. Framework for Evaluating Clustering Algorithms in Duplicate Detection , 2009, Proc. VLDB Endow..

[80] Yi Sun,et al. Hash ^ed -Join: Approximate String Similarity Join with Hashing , 2014, DASFAA Workshops.

[81] C. Seshadhri,et al. A simpler sublinear algorithm for approximating the triangle count , 2015, ArXiv.

[82] Sreenivas Gollapudi,et al. A dictionary for approximate string search and longest prefix search , 2006, CIKM '06.

[83] Aleksei V. Fishkin,et al. Disk Graphs: A Short Survey , 2003, WAOA.

[84] Russell Impagliazzo,et al. On the Complexity of k-SAT , 2001, J. Comput. Syst. Sci..

[85] Michiel H. M. Smid,et al. Sequential and parallel algorithms for the k closest pairs problem , 1995, Int. J. Comput. Geom. Appl..

[86] Holger Dell,et al. Fine-grained reductions from approximate counting to decision , 2017, STOC.

[87] I. Anderson. Combinatorics of Finite Sets , 1987 .

[88] Ryan Williams,et al. Probabilistic Polynomials and Hamming Nearest Neighbors , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[89] Akshay Krishnamurthy,et al. A Hierarchical Algorithm for Extreme Clustering , 2017, KDD.

[90] Gustavo Malkomes,et al. Fast Distributed k-Center Clustering with Outliers on Massive Data , 2015, NIPS.

[91] Timothy M. Chan,et al. Polynomial Representations of Threshold Functions and Algorithmic Applications , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[92] Peter Christen,et al. Data Matching , 2012, Data-Centric Systems and Applications.

[93] Krzysztof Onak,et al. A near-optimal sublinear-time algorithm for approximating the minimum vertex cover size , 2011, SODA.

[94] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.

[95] Guoliang Li,et al. MassJoin: A mapreduce-based method for scalable string similarity joins , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[96] Ashish Goel,et al. Efficient distributed locality sensitive hashing , 2012, CIKM.

[97] Boris Aronov,et al. On approximating the depth and related problems , 2005, SODA '05.

[98] Aditya G. Parameswaran,et al. Fuzzy Joins Using MapReduce , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[99] Sudipto Guha,et al. Distributed Partial Clustering , 2017, SPAA.

[100] Hanna M. Wallach,et al. Flexible Models for Microclustering with Application to Entity Resolution , 2016, NIPS.

[101] Ping Li,et al. One Permutation Hashing , 2012, NIPS.

[102] Piotr Indyk,et al. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[103] Ronitt Rubinfeld,et al. A sublinear algorithm for weakly approximating edit distance , 2003, STOC '03.

[104] David P. Woodruff,et al. Communication-Optimal Distributed Clustering , 2016, NIPS.

[105] Alon Orlitsky,et al. Estimating the number of defectives with group testing , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[106] Tselil Schramm,et al. Near Optimal LP Rounding Algorithm for CorrelationClustering on Complete and Complete k-partite Graphs , 2014, STOC.

[107] Dana Ron,et al. Approximating average parameters of graphs , 2008, Random Struct. Algorithms.

[108] Sergio Cabello,et al. Shortest paths in intersection graphs of unit disks , 2014, Comput. Geom..

[109] Robert Krauthgamer,et al. Embedding the Ulam metric into l1 , 2006, Theory Comput..

[110] Yongfeng Huang,et al. Efficient string similarity join in multi-core and distributed systems , 2017, PloS one.

[111] Aravindan Vijayaraghavan,et al. Bilu-Linial Stable Instances of Max Cut and Minimum Multiway Cut , 2013, SODA.

[112] Nathan Linial,et al. The influence of variables on Boolean functions , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[113] Olgica Milenkovic,et al. Portable and Error-Free DNA-Based Data Storage , 2016, Scientific Reports.

[114] Dana Ron,et al. On approximating the number of k-cliques in sublinear time , 2017, STOC.

[115] Yuval Rabani,et al. Improved lower bounds for embeddings into L1 , 2006, SODA '06.

[116] Mike Paterson,et al. A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[117] Ravishankar Krishnaswamy,et al. Relax, No Need to Round: Integrality of Clustering Formulations , 2014, ITCS.

[118] L. H. Harper. Global Methods for Combinatorial Isoperimetric Problems , 2004 .

[119] Esko Ukkonen,et al. Algorithms for Approximate String Matching , 1985, Inf. Control..

[120] Alexandr Andoni,et al. Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[121] W. Swallow,et al. Using group testing to estimate a proportion, and to test the binomial model. , 1990, Biometrics.

[122] Anna Pagh,et al. Linear probing with constant independence , 2006, STOC '07.

[123] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[124] Dana Ron,et al. Counting stars and other small subgraphs in sublinear time , 2010, SODA '10.

[125] Gad M. Landau,et al. Incremental String Comparison , 1998, SIAM J. Comput..

[126] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[127] A. J. Bernstein,et al. Maximally Connected Arrays on the n-Cube , 1967 .

[128] Dimitris S. Papailiopoulos,et al. Parallel Correlation Clustering on Big Graphs , 2015, NIPS.

[129] Din J. Wasem,et al. Mining of Massive Datasets , 2014 .

[130] Dana Ron,et al. Comparing the strength of query types in property testing: The case of k-colorability , 2012, computational complexity.

[131] Dana Ron,et al. The Power of an Example , 2014, ACM Trans. Comput. Theory.

[132] B. Szegedy,et al. On the logarithimic calculus and Sidorenko's conjecture , 2011, 1107.1153.

[133] Rudolf Ahlswede,et al. Contributions to the geometry of hamming spaces , 1977, Discret. Math..

[134] Jeffrey D. Ullman,et al. Anchor-Points Algorithms for Hamming and Edit Distances Using MapReduce , 2014, ICDT.

[135] Ping Li,et al. b-Bit minwise hashing , 2009, WWW '10.

[136] John H. Lindsey,et al. Assignment of Numbers to Vertices , 1964 .

[137] Sergiu Hart,et al. A note on the edges of the n-cube , 1976, Discret. Math..

[138] R. Ahlswede,et al. Graphs with maximal number of adjacent pairs of edges , 1978 .

[139] Cyrus Rashtchian,et al. Massively-Parallel Similarity Join, Edge-Isoperimetry, and Distance Correlations on the Hypercube , 2016, SODA.

[140] Uriel Feige,et al. On sums of independent random variables with unbounded variance, and estimating the average degree in a graph , 2004, STOC '04.

[141] Béla Bollobás,et al. Sums in the grid , 1996, Discret. Math..

[142] Cyrus Rashtchian,et al. Random access in large-scale DNA data storage , 2018, Nature Biotechnology.

[143] Dan Suciu,et al. Communication Steps for Parallel Query Processing , 2017, J. ACM.

[144] Rafail Ostrovsky,et al. Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[145] Fan Chung Graham,et al. Concentration Inequalities and Martingale Inequalities: A Survey , 2006, Internet Math..

[146] Jeremy Buhler,et al. Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..

[147] Guoliang Li,et al. String similarity search and join: a survey , 2016, Frontiers of Computer Science.

[148] P. Erdös,et al. INTERSECTION THEOREMS FOR SYSTEMS OF FINITE SETS , 1961 .