Cloud-Scale Entity Resolution: Current State and Open Challenges
暂无分享,去创建一个
[1] Yasin N. Silva,et al. MapReduce-based similarity join for metric spaces , 2012, Cloud-I '12.
[2] Xiao Chen. Crowdsourcing Entity Resolution: a Short Overview and Open Issues , 2015, GvD.
[3] Chen Li,et al. Efficient record linkage in large data sets , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..
[4] Gautam Shroff,et al. Graph-Parallel Entity Resolution using LSH & IMM , 2014, EDBT/ICDT Workshops.
[5] Bo Yang,et al. Parallel NoSQL Entity Resolution Approach with MapReduce , 2015, 2015 International Conference on Intelligent Networking and Collaborative Systems.
[6] Jimmy J. Lin,et al. Pairwise Document Similarity in Large Collections with MapReduce , 2008, ACL.
[7] S. Vasavi,et al. Hadoop Framework For Entity Resolution Within High Velocity Streams , 2016 .
[8] Georgia Koutrika,et al. Entity resolution with iterative blocking , 2009, SIGMOD Conference.
[9] George Papastefanatos,et al. Parallel meta-blocking for scaling entity resolution over big heterogeneous data , 2017, Inf. Syst..
[10] Ian H. Witten,et al. Managing gigabytes , 1994 .
[11] Avigdor Gal,et al. Comparative Analysis of Approximate Blocking Techniques for Entity Resolution , 2016, Proc. VLDB Endow..
[12] Carlos Eduardo S. Pires,et al. Adaptive sorted neighborhood blocking for entity matching with MapReduce , 2015, SAC.
[13] L. R. Dice. Measures of the Amount of Ecologic Association Between Species , 1945 .
[14] Peng Wang,et al. An efficient MapReduce algorithm for similarity join in metric spaces , 2016, The Journal of Supercomputing.
[15] Kostas Tzoumas,et al. Introduction to Apache Flink: Stream Processing for Real Time and Beyond , 2016 .
[16] Salvatore J. Stolfo,et al. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.
[17] Erkki Sutinen,et al. Indexing text with approximate q-grams , 2000, J. Discrete Algorithms.
[18] Andreas Thor,et al. Learning-based entity resolution with MapReduce , 2011, CloudDB '11.
[19] Carlos Eduardo S. Pires,et al. Improving load balancing for MapReduce-based entity matching , 2013, 2013 IEEE Symposium on Computers and Communications (ISCC).
[20] Hanan Samet,et al. Metric space similarity joins , 2008, TODS.
[21] Andreas Thor,et al. Load Balancing for MapReduce-based Entity Resolution , 2011, 2012 IEEE 28th International Conference on Data Engineering.
[22] Anuradha Bhamidipaty,et al. Interactive deduplication using active learning , 2002, KDD.
[23] Michael Stonebraker,et al. A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.
[24] Andreas Thor,et al. Multi-pass sorted neighborhood blocking with MapReduce , 2012, Computer Science - Research and Development.
[25] Avigdor Gal. Uncertain entity resolution: re-evaluating entity resolution in the big data era: tutorial , 2014, VLDB 2014.
[26] Gerhard Weikum,et al. LINDA: distributed web-of-data-scale entity matching , 2012, CIKM.
[27] Peter Christen,et al. A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication , 2012, IEEE Transactions on Knowledge and Data Engineering.
[28] Andreas Thor,et al. Dedoop: Efficient Deduplication with Hadoop , 2012, Proc. VLDB Endow..
[29] Michael Stonebraker,et al. MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.
[30] Jayant Madhavan,et al. Reference reconciliation in complex information spaces , 2005, SIGMOD '05.
[31] Jiajin Le,et al. An Efficient Parallel Top-k Similarity Join for Massive Multidimensional Data Using Spark , 2015 .
[32] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.
[33] Carlos Alberto Heuser,et al. A fast approach for parallel deduplication on multicore processors , 2011, SAC '11.
[34] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[35] Erhard Rahm,et al. Data Partitioning for Parallel Entity Matching , 2010, ArXiv.
[36] Jianmin Wang,et al. MapDupReducer: detecting near duplicates over massive datasets , 2010, SIGMOD Conference.
[37] Christos Faloutsos,et al. V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors , 2012, Proc. VLDB Endow..
[38] Gang Chen,et al. Metric Similarity Joins Using MapReduce , 2017, IEEE Transactions on Knowledge and Data Engineering.
[39] Ihab F. Ilyas,et al. Distributed Data Deduplication , 2016, Proc. VLDB Endow..
[40] Bo Yang,et al. Large-Scale Schema-Free Data Deduplication Approach with Adaptive Sliding Window Using MapReduce , 2015, Comput. J..
[41] Marcos Barreto,et al. A Spark-based Workflow for Probabilistic Record Linkage of Healthcare Data , 2015, EDBT/ICDT Workshops.
[42] Thomas Seidl,et al. MR-DSJ: Distance-Based Self-Join for Large-Scale Vector Data Analysis with MapReduce , 2013, BTW.
[43] David Guy Brizan,et al. A. Survey of Entity Resolution and Record Linkage Methodologies , 2015, Communications of the IIMA.
[44] Ivan P. Fellegi,et al. A Theory for Record Linkage , 1969 .
[45] Felix Naumann,et al. Adaptive Windows for Duplicate Detection , 2012, 2012 IEEE 28th International Conference on Data Engineering.
[46] Douglas W. Oard,et al. Improving text classification for oral history archives with temporal domain knowledge , 2007, SIGIR.
[47] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.
[48] Andreas Thor,et al. Block-based load balancing for entity resolution with MapReduce , 2011, CIKM '11.
[49] Carlos Eduardo S. Pires,et al. An efficient spark-based adaptive windowing for entity matching , 2017, J. Syst. Softw..
[50] P. Jaccard,et al. Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .
[51] Wagner Meira,et al. A Scalable Parallel Deduplication Algorithm , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).
[52] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[53] Yeye He,et al. ClusterJoin: A Similarity Joins Framework using Map-Reduce , 2014, Proc. VLDB Endow..
[54] Jennifer Widom,et al. GPS: a graph processing system , 2013, SSDBM.
[55] W. Winkler. Overview of Record Linkage and Current Research Directions , 2006 .
[56] Andreas Thor,et al. Don't match twice: redundancy-free similarity computation with MapReduce , 2013, DanaC '13.
[57] Lifang Gu,et al. Record Linkage: Current Practice and Future Directions , 2003 .
[58] Dongwon Lee,et al. Parallel linkage , 2007, CIKM '07.
[59] Stuart J. Russell,et al. Object Identification: A Bayesian Analysis with Application to Traffic Surveillance , 1998, Artif. Intell..
[60] Yasin N. Silva,et al. Exploiting MapReduce-based similarity joins , 2012, SIGMOD Conference.
[61] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .
[62] Ming-Yen Lin,et al. A load-balanced mapreduce algorithm for blocking-based entity-resolution with multiple keys , 2014 .
[63] Ashwin Machanavajjhala,et al. Network sampling , 2013, KDD.
[64] Keizo Oyama,et al. A Fast Linkage Detection Scheme for Multi-Source Information Integration , 2005, International Workshop on Challenges in Web Information Retrieval and Integration.
[65] Jeffrey Xu Yu,et al. Efficient similarity joins for near-duplicate detection , 2011, TODS.
[66] Peter Christen,et al. Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface , 2008, KDD.
[67] Guoliang Li,et al. Efficient parallel partition-based algorithms for similarity search and join with edit distance constraints , 2013, EDBT '13.
[68] Felix Naumann,et al. Scalable Iterative Graph Duplicate Detection , 2012, IEEE Transactions on Knowledge and Data Engineering.
[69] Erhard Rahm,et al. Parallel Entity Resolution with Dedoop , 2012, Datenbank-Spektrum.
[70] Guoqiang Li,et al. Unsupervised blocking and probabilistic parallelisation for record matching of distributed big data , 2017, The Journal of Supercomputing.
[71] Hakan Kardes,et al. Graph-based Approaches for Organization Entity Resolution in MapReduce , 2013, TextGraphs@EMNLP.
[72] Hector Garcia-Molina,et al. P-Swoosh: Parallel Algorithm for Generic Entity Resolution , 2006 .
[73] Seif Haridi,et al. Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..
[74] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.
[75] Erhard Rahm,et al. Frameworks for entity matching: A comparison , 2010, Data Knowl. Eng..
[76] Ashwin Machanavajjhala,et al. Entity Resolution: Theory, Practice & Open Challenges , 2012, Proc. VLDB Endow..
[77] Andrew Borthwick,et al. Dynamic Record Blocking: Efficient Linking of Massive Databases in MapReduce , 2012 .
[78] Hector Garcia-Molina,et al. D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).
[79] Abraham Silberschatz,et al. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..
[80] Yuan Xue,et al. Scalable load balancing for mapreduce-based record linkage , 2013, 2013 IEEE 32nd International Performance Computing and Communications Conference (IPCCC).
[81] Xiaoyong Du,et al. Efficient Duplicate Detection on Cloud Using a New Signature Scheme , 2011, WAIM.
[82] Chen Li,et al. Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.
[83] Thomas Seidl,et al. PHiDJ: Parallel similarity self-join for high-dimensional vector data with MapReduce , 2014, 2014 IEEE 30th International Conference on Data Engineering.
[84] Ranieri Baraglia,et al. Document Similarity Self-Join with MapReduce , 2010, 2010 IEEE International Conference on Data Mining.
[85] Peter Christen,et al. Data Matching , 2012, Data-Centric Systems and Applications.
[86] Vasilis Efthymiou,et al. Entity resolution in the web of data , 2013, Entity Resolution in the Web of Data.
[87] Ralph Weischedel,et al. PERFORMANCE MEASURES FOR INFORMATION EXTRACTION , 2007 .
[88] Guoliang Li,et al. MassJoin: A mapreduce-based method for scalable string similarity joins , 2014, 2014 IEEE 30th International Conference on Data Engineering.