Fast, incremental, and scalable all pairs similarity search
暂无分享,去创建一个
[1] Mehran Sahami,et al. Evaluating similarity measures: a large-scale study in the orkut social network , 2005, KDD '05.
[2] Jeffrey Xu Yu,et al. Efficient similarity joins for near duplicate detection , 2008, WWW.
[3] Xuemin Lin,et al. Top-k Set Similarity Joins , 2009, 2009 IEEE 25th International Conference on Data Engineering.
[4] Nagiza F. Samatova,et al. Incremental all pairs similarity search for varying similarity thresholds , 2009, SNA-KDD '09.
[5] Vipin Kumar,et al. Introduction to Data Mining, (First Edition) , 2005 .
[6] M. Newman,et al. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.
[7] Ophir Frieder,et al. Collection statistics for fast duplicate document detection , 2002, TOIS.
[8] Jeffrey Dean,et al. Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.
[9] Arkady B. Zaslavsky,et al. Efficiency of data structures for detecting overlaps in digital documents , 2001, Proceedings 24th Australian Computer Science Conference. ACSC 2001.
[10] Sunita Sarawagi,et al. Efficient set joins on similarity predicates , 2004, SIGMOD '04.
[11] Steve Chien,et al. Semantic similarity between search engine queries using temporal correlation , 2005, WWW '05.
[12] Ron Sacks-Davis,et al. Fast Document Ranking for Large Scale Information Retrieval , 1994, ADB.
[13] Divesh Srivastava,et al. Fast Indexes and Algorithms for Set Similarity Selection Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.
[14] Reda Alhajj,et al. High performance computing for spatial outliers detection using parallel wavelet transform , 2007, Intell. Data Anal..
[15] Rakesh Agrawal,et al. Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..
[16] Hanan Samet,et al. Incremental distance join algorithms for spatial databases , 1998, SIGMOD '98.
[17] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[18] Nagiza F. Samatova,et al. pR: Lightweight, Easy-to-Use Middleware to Plugin Parallel Analytical Computing with R , 2009, IKE.
[19] Dongwon Lee,et al. Parallel linkage , 2007, CIKM '07.
[20] Howard R. Turtle,et al. Query Evaluation: Strategies and Optimizations , 1995, Inf. Process. Manag..
[21] Divyakant Agrawal,et al. Detectives: detecting coalition hit inflation attacks in advertising networks streams , 2007, WWW '07.
[22] Krishna P. Gummadi,et al. Measurement and analysis of online social networks , 2007, IMC '07.
[23] W. Bruce Croft,et al. Optimization strategies for complex queries , 2005, SIGIR '05.
[24] Stephen Blott,et al. What's wrong with high-dimensional similarity search? , 2008, Proc. VLDB Endow..
[25] Bin Wang,et al. VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams , 2007, VLDB.
[26] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.
[27] Doug Beeferman,et al. Agglomerative clustering of a search engine query log , 2000, KDD '00.
[28] Xuemin Lin,et al. Ed-Join: an efficient algorithm for similarity joins with edit distance constraints , 2008, Proc. VLDB Endow..
[29] Hanan Samet,et al. A Fast Similarity Join Algorithm Using Graphics Processing Units , 2008, 2008 IEEE 24th International Conference on Data Engineering.
[30] Seung-won Hwang,et al. Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.
[31] Shenghuo Zhu,et al. Learning multiple graphs for document recommendations , 2008, WWW.
[32] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.
[33] Michel Barlaud,et al. Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[34] Mehran Sahami,et al. A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.
[35] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[36] Pabitra Mitra,et al. Selective hypertext induced topic search , 2006, WWW '06.
[37] Andrei Z. Broder,et al. Graph structure in the Web , 2000, Comput. Networks.
[38] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .
[39] Sergei Vassilvitskii,et al. Top-k aggregation using intersections of ranked inputs , 2009, WSDM '09.
[40] Bil Lewis,et al. Multithreaded Programming With PThreads , 1997 .
[41] Jiangchuan Liu,et al. Statistics and Social Network of YouTube Videos , 2008, 2008 16th Interntional Workshop on Quality of Service.
[42] Hector Garcia-Molina,et al. Building a scalable and accurate copy detection mechanism , 1996, DL '96.
[43] Hector Garcia-Molina,et al. Finding Near-Replicas of Documents and Servers on the Web , 1998, WebDB.
[44] Marc Najork,et al. On the evolution of clusters of near-duplicate Web pages , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[45] Roberto J. Bayardo,et al. Scaling up all pairs similarity search , 2007, WWW '07.
[46] Chris Buckley,et al. Optimization of inverted vector searches , 1985, SIGIR '85.
[47] Dimitris Papadias,et al. Top-k spatial joins , 2005, IEEE Transactions on Knowledge and Data Engineering.
[48] Pavel Zezula,et al. A distributed incremental nearest neighbor algorithm , 2007 .
[49] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[50] Ravi Kumar,et al. Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.
[51] Marc Najork,et al. Detecting phrase-level duplication on the world wide web , 2005, SIGIR '05.
[52] Soumen Chakrabarti,et al. Mining the web - discovering knowledge from hypertext data , 2002 .
[53] James H. Anderson,et al. On the Design and Implementation of a Cache-Aware Multicore Real-Time Scheduler , 2009, 2009 21st Euromicro Conference on Real-Time Systems.
[54] Nagiza F. Samatova,et al. Fast Matching for All Pairs Similarity Search , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.
[55] Eibe Frank,et al. An Empirical Comparison of Exact Nearest Neighbour Algorithms , 2007, PKDD.
[56] Jaewoo Kang,et al. Selective Approach To Handling Topic Oriented Tasks On The World Wide Web , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.
[57] Surajit Chaudhuri,et al. A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).
[58] Hans-Peter Kriegel,et al. The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.
[59] Salvatore J. Stolfo,et al. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.
[60] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[61] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[62] Ray A. Jarvis,et al. Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.
[63] Christian Böhm,et al. High performance clustering based on the similarity join , 2000, CIKM '00.
[64] Ravi Kumar,et al. Structure and evolution of online social networks , 2006, KDD '06.
[65] Fenglou Mao,et al. Parallel Clustering Algorithm for Large Data Sets with Applications in Bioinformatics , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[66] D. Geer,et al. Chip makers turn to multicore processors , 2005, Computer.