Performance Comparison of Three Spark-Based Implementations of Parallel Entity Resolution
暂无分享,去创建一个
[1] Holden Karau,et al. High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark , 2017 .
[2] Avigdor Gal,et al. Comparative Analysis of Approximate Blocking Techniques for Entity Resolution , 2016, Proc. VLDB Endow..
[3] Gunter Saake,et al. Cloud-Scale Entity Resolution: Current State and Open Challenges , 2018, Open J. Big Data.
[4] Peter Christen,et al. Data Matching , 2012, Data-Centric Systems and Applications.
[5] William W. Cohen,et al. A Comparison of String Metrics for Matching Names and Records , 2003 .
[6] Joseph K. Bradley,et al. Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.
[7] Marcos Barreto,et al. A Spark-based Workflow for Probabilistic Record Linkage of Healthcare Data , 2015, EDBT/ICDT Workshops.
[8] Peter Christen,et al. GeCo: an online personal data generator and corruptor , 2013, CIKM.
[9] Carlos Eduardo S. Pires,et al. An efficient spark-based adaptive windowing for entity matching , 2017, J. Syst. Softw..
[10] Gunter Saake,et al. Exploring Spark-SQL-Based Entity Resolution Using the Persistence Capability , 2018, BDAS.
[11] Chen Wang,et al. Parallel Duplicate Detection in Adverse Drug Reaction Databases with Spark , 2016, EDBT.
[12] Peter Christen,et al. Flexible and extensible generation and corruption of personal data , 2013, CIKM.