Large-Scale Data Pollution with Apache Spark
暂无分享,去创建一个
Norbert Ritter | Niklas Wilcke | Fabian Panse | Kai Hildebrandt | N. Ritter | Fabian Panse | Kai Hildebrandt | Niklas Wilcke
[1] Peter Christen. Development and user experiences of an open source data cleaning, deduplication and record linkage system , 2009, SKDD.
[2] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.
[3] Erhard Rahm,et al. Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..
[4] Peter Christen,et al. Accurate Synthetic Generation of Realistic Personal Information , 2009, PAKDD.
[5] Peter Christen,et al. Flexible and extensible generation and corruption of personal data , 2013, CIKM.
[6] Kenneth Ward Church,et al. Probability scoring for spelling correction , 1991 .
[7] Alon Y. Halevy,et al. Principles of Data Integration , 2012 .
[8] Hector Garcia-Molina,et al. Evaluating entity resolution results , 2010, Proc. VLDB Endow..
[9] Salvatore J. Stolfo,et al. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.
[10] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.
[11] Peter Christen,et al. GeCo: an online personal data generator and corruptor , 2013, CIKM.
[12] Felix Naumann,et al. Profiling relational data: a survey , 2015, The VLDB Journal.
[13] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[14] Ashwin Machanavajjhala,et al. Network sampling , 2013, KDD.
[15] Divesh Srivastava,et al. Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..
[16] Anish Das Sarma,et al. Data Cleaning: A Practical Perspective , 2013, Data Cleaning: A Practical Perspective.
[17] Peter Christen,et al. Data Matching , 2012, Data-Centric Systems and Applications.
[18] Divesh Srivastava,et al. Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[19] Vijay V. Raghavan,et al. NoSQL Systems for Big Data Management , 2014, 2014 IEEE World Congress on Services.
[20] Felix Naumann,et al. An Introduction to Duplicate Detection , 2010, An Introduction to Duplicate Detection.
[21] Norbert Ritter,et al. Scalable data management: NoSQL data stores in research and practice , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).