Probabilistic parallelisation of blocking non-matched records for big data
暂无分享,去创建一个
[1] Peter Christen,et al. Febrl - A Parallel Open Source Data Linkage System: http://datamining.anu.edu.au/linkage.html , 2004, PAKDD.
[2] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.
[3] Surajit Chaudhuri,et al. Example-driven design of efficient record matching queries , 2007, VLDB.
[4] Raymond J. Mooney,et al. Adaptive Blocking: Learning to Scale Up Record Linkage , 2006, Sixth International Conference on Data Mining (ICDM'06).
[5] Vasilis Efthymiou,et al. Big data entity resolution: From highly to somehow similar entity descriptions in the Web , 2015, 2015 IEEE International Conference on Big Data (Big Data).
[6] Dongyao Wu,et al. Building Pipelines for Heterogeneous Execution Environments for Big Data Processing , 2016, IEEE Software.
[7] Dongwon Lee,et al. Parallel linkage , 2007, CIKM '07.
[8] Peter Christen,et al. A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication , 2012, IEEE Transactions on Knowledge and Data Engineering.
[9] Guoqiang Li,et al. LogProv: Logging events as provenance of big data analytics pipelines with trustworthiness , 2016, 2016 IEEE International Conference on Big Data (Big Data).
[10] Andreas Thor,et al. Dedoop: Efficient Deduplication with Hadoop , 2012, Proc. VLDB Endow..
[11] George Papastefanatos,et al. Parallel meta-blocking: Realizing scalable entity resolution over large, heterogeneous data , 2015, 2015 IEEE International Conference on Big Data (Big Data).
[12] Georgia Koutrika,et al. Entity resolution with iterative blocking , 2009, SIGMOD Conference.
[13] Wolfgang Nejdl,et al. Meta-Blocking: Taking Entity Resolutionto the Next Level , 2014, IEEE Transactions on Knowledge and Data Engineering.
[14] Andreas Thor,et al. Multi-pass sorted neighborhood blocking with MapReduce , 2012, Computer Science - Research and Development.
[15] Craig A. Knoblock,et al. Learning domain-independent string transformation weights for high accuracy object identification , 2002, KDD.
[16] Raghav Kaushik,et al. On active learning of record matching packages , 2010, SIGMOD Conference.
[17] Anuradha Bhamidipaty,et al. Interactive deduplication using active learning , 2002, KDD.
[18] Howard B. Newcombe,et al. Handbook of record linkage: methods for health and statistical studies, administration, and business , 1988 .
[19] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.
[20] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..
[21] Mikhail Bilenko and Raymond J. Mooney,et al. On Evaluation and Training-Set Construction for Duplicate Detection , 2003 .
[22] Raymond K. Wong,et al. Unsupervised Blocking of Imbalanced Datasets for Record Matching , 2016, WISE.
[23] Craig A. Knoblock,et al. Learning Blocking Schemes for Record Linkage , 2006, AAAI.