An Overview of End-to-End Entity Resolution for Big Data

One of the most critical tasks for improving data quality and increasing the reliability of data analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to the...

[1]  Vasilis Efthymiou,et al.  MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities , 2019, EDBT.

[2]  Evaggelia Pitoura,et al.  Identifying Bias in Name Matching Tasks , 2019, EDBT.

[3]  Lise Getoor,et al.  Query-time entity resolution , 2006, KDD '06.

[4]  Wolfgang Nejdl,et al.  Meta-Blocking: Taking Entity Resolutionto the Next Level , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Chengkai Li,et al.  A benchmarking study of embedding-based entity alignment for knowledge graphs , 2020, Proc. VLDB Endow..

[6]  Daniel P. Miranker,et al.  A two-step blocking scheme learner for scalable link discovery , 2014, OM.

[7]  George Papadakis,et al.  JedAI3 : beyond batch, blocking-based Entity Resolution , 2020, EDBT.

[8]  BengioYoshua,et al.  A neural probabilistic language model , 2003 .

[9]  Juan A. Díaz,et al.  A Tabu search heuristic for the generalized assignment problem , 2001, Eur. J. Oper. Res..

[10]  Rajasekar Krishnamurthy,et al.  HIL: a high-level scripting language for entity integration , 2013, EDBT '13.

[11]  Jerome M. Kurtzberg,et al.  On Approximation Methods for the Assignment Problem , 1962, JACM.

[12]  Hector Garcia-Molina,et al.  Pay-As-You-Go Entity Resolution , 2013, IEEE Transactions on Knowledge and Data Engineering.

[13]  Peter Christen,et al.  A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication , 2012, IEEE Transactions on Knowledge and Data Engineering.

[14]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[15]  Rajeev Motwani,et al.  Incremental Clustering and Dynamic Information Retrieval , 2004, SIAM J. Comput..

[16]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  Xianpei Han,et al.  End-to-End Multi-Perspective Matching for Entity Resolution , 2019, IJCAI.

[18]  Shai Ben-David,et al.  A Semi-Supervised Framework of Clustering Selection for De-Duplication , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[19]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[20]  Miryung Kim,et al.  BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[21]  James J. Lu,et al.  Fine-grained record integration and linkage tool. , 2008, Birth defects research. Part A, Clinical and molecular teratology.

[22]  Huizhi Liang,et al.  Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution , 2015, ACM J. Data Inf. Qual..

[23]  Felix Naumann,et al.  Progressive Duplicate Detection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[24]  Christoph Lange,et al.  Evaluating the quality of the LOD cloud: An empirical investigation , 2018, Semantic Web.

[25]  Aris Gkoulalas-Divanis,et al.  Summarization Algorithms for Record Linkage , 2018, EDBT.

[26]  Felix Naumann,et al.  Detecting Duplicates in Complex XML Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Felix Naumann,et al.  DuDe: The Duplicate Detection Toolkit , 2010 .

[28]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[30]  Wei Hu,et al.  Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding , 2017, SEMWEB.

[31]  Weifeng Su,et al.  Record Matching over Query Results from Multiple Web Databases , 2010, IEEE Transactions on Knowledge and Data Engineering.

[32]  Witold Pedrycz,et al.  A supervised gradient-based learning algorithm for optimized entity resolution , 2017, Data Knowl. Eng..

[33]  Vasilis Efthymiou,et al.  Entity resolution in the web of data , 2013, Entity Resolution in the Web of Data.

[34]  Nilesh N. Dalvi,et al.  Large-Scale Collective Entity Matching , 2011, Proc. VLDB Endow..

[35]  Markus Stumptner,et al.  Certus: An Effective Entity Resolution Approach with Graph Differential Dependencies (GDDs) , 2019, Proc. VLDB Endow..

[36]  Alieh Saeedi,et al.  Scalable Matching and Clustering of Entities with FAMER , 2018, Complex Syst. Informatics Model. Q..

[37]  Jian Li,et al.  Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach , 2016, SIGMOD Conference.

[38]  Avigdor Gal,et al.  Comparative Analysis of Approximate Blocking Techniques for Entity Resolution , 2016, Proc. VLDB Endow..

[39]  Vijaya Krishna Yalavarthi,et al.  Select Your Questions Wisely: For Entity Resolution With Crowd Errors , 2017, CIKM.

[40]  Marcos André Gonçalves,et al.  BLOSS: Effective meta-blocking with almost no effort , 2018, Inf. Syst..

[41]  Renée J. Miller,et al.  Framework for Evaluating Clustering Algorithms in Duplicate Detection , 2009, Proc. VLDB Endow..

[42]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[43]  Alieh Saeedi,et al.  Comparative Evaluation of Distributed Clustering Schemes for Multi-source Entity Resolution , 2017, ADBIS.

[44]  Steven Skiena,et al.  Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment , 2018, IJCAI.

[45]  Piotr Indyk,et al.  Scalable Techniques for Clustering the Web , 2000, WebDB.

[46]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[47]  Songcan Chen,et al.  Cross modal similarity learning with active queries , 2018, Pattern Recognit..

[48]  Claudia Niederée,et al.  Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data , 2012, WSDM '12.

[49]  Thanh Tran,et al.  SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets , 2015, IEEE Transactions on Knowledge and Data Engineering.

[50]  Arjun Mukherjee,et al.  Leveraging Social Media Signals for Record Linkage , 2018, WWW.

[51]  BhattacharyaIndrajit,et al.  Query-time entity resolution , 2007 .

[52]  Tim Kraska,et al.  Slice Finder: Automated Data Slicing for Model Validation , 2018, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[53]  Daniel P. Miranker,et al.  An unsupervised instance matcher for schema-free RDF data , 2015, J. Web Semant..

[54]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[55]  FreundYoav,et al.  Large Margin Classification Using the Perceptron Algorithm , 1999 .

[56]  William W. Cohen,et al.  Learning to match and cluster large high-dimensional data sets for data integration , 2002, KDD.

[57]  Alon Y. Halevy,et al.  Data Integration: After the Teenage Years , 2017, PODS.

[58]  Jayant Madhavan,et al.  Reference reconciliation in complex information spaces , 2005, SIGMOD '05.

[59]  Jungo Kasai,et al.  Low-resource Deep Entity Resolution with Transfer and Active Learning , 2019, ACL.

[60]  Yuriy Brun,et al.  Themis: automatically testing software for discrimination , 2018, ESEC/SIGSOFT FSE.

[61]  Ashwin Machanavajjhala,et al.  An automatic blocking mechanism for large-scale de-duplication tasks , 2012, CIKM '12.

[62]  George Papastefanatos,et al.  Parallel meta-blocking for scaling entity resolution over big heterogeneous data , 2017, Inf. Syst..

[63]  Juan-Zi Li,et al.  RiMOM-IM: A Novel Iterative Framework for Instance Matching , 2016, Journal of Computer Science and Technology.

[64]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[65]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[66]  Surajit Chaudhuri,et al.  Eliminating Fuzzy Duplicates in Data Warehouses , 2002, VLDB.

[67]  AnHai Doan,et al.  CloudMatcher: A Hands-Off Cloud/Crowd Service for Entity Matching , 2018, Proc. VLDB Endow..

[68]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[69]  Keizo Oyama,et al.  A Fast Linkage Detection Scheme for Multi-Source Information Integration , 2005, International Workshop on Challenges in Web Information Retrieval and Integration.

[70]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[71]  Hector Garcia-Molina,et al.  Entity Resolution with crowd errors , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[72]  George Papastefanatos,et al.  Supervised Meta-blocking , 2014, Proc. VLDB Endow..

[73]  Louis-Philippe Morency,et al.  Challenges and applications in multimodal machine learning , 2018, The Handbook of Multimodal-Multisensor Interfaces, Volume 2.

[74]  Ashwin Machanavajjhala,et al.  Entity Resolution: Theory, Practice & Open Challenges , 2012, Proc. VLDB Endow..

[75]  Hong Cheng,et al.  Discovering Conditional Matching Rules , 2017, ACM Trans. Knowl. Discov. Data.

[76]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[77]  Christos Faloutsos,et al.  AutoBlock: A Hands-off Blocking Framework for Entity Matching , 2020, WSDM.

[78]  Heiko Paulheim,et al.  Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job , 2016, LREC.

[79]  Sibo Wang,et al.  Crowd-Based Deduplication: An Adaptive Approach , 2015, SIGMOD Conference.

[80]  Huizhi Liang,et al.  Dynamic Similarity-Aware Inverted Indexing for Real-Time Entity Resolution , 2013, PAKDD Workshops.

[81]  Nilesh N. Dalvi,et al.  Crowdsourcing Algorithms for Entity Resolution , 2014, Proc. VLDB Endow..

[82]  Avigdor Gal Uncertain entity resolution: re-evaluating entity resolution in the big data era: tutorial , 2014, VLDB 2014.

[83]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[84]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[85]  Ursin Brunner,et al.  Entity Matching with Transformer Architectures - A Step Forward in Data Integration , 2020, EDBT.

[86]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[87]  Tim Kraska,et al.  A sample-and-clean framework for fast and accurate query processing on dirty data , 2014, SIGMOD Conference.

[88]  Themis Palpanas,et al.  Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search , 2019, Proc. VLDB Endow..

[89]  Francesco Bonchi,et al.  Algorithmic Bias: From Discrimination Discovery to Fairness-aware Data Mining , 2016, KDD.

[90]  AnHai Doan,et al.  Technical Perspective:: Toward Building Entity Matching Management Systems , 2016, SGMD.

[91]  Guoliang Li,et al.  Crowdsourced Data Management: Overview and Challenges , 2017, SIGMOD Conference.

[92]  Shafiq R. Joty,et al.  Feature space of DT Featu re space of DS Feature Truncation Feature Standardization , 2018 .

[93]  Andrew Borthwick,et al.  Dynamic Record Blocking: Efficient Linking of Massive Databases in MapReduce , 2012 .

[94]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[95]  S. Dongen Graph clustering by flow simulation , 2000 .

[96]  Murat Sariyar,et al.  Controlling false match rates in record linkage using extreme value theory , 2011, J. Biomed. Informatics.

[97]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[98]  David Hawking,et al.  Similarity-aware indexing for real-time entity resolution , 2009, CIKM.

[99]  Peter Christen,et al.  Robust Temporal Graph Clustering for Group Record Linkage , 2019, PAKDD.

[100]  Zhiyuan Liu,et al.  Iterative Entity Alignment via Joint Knowledge Embeddings , 2017, IJCAI.

[101]  Vasilis Efthymiou,et al.  Matching Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings , 2017, SEMWEB.

[102]  Divesh Srivastava,et al.  Online Entity Resolution Using an Oracle , 2016, Proc. VLDB Endow..

[103]  Hector Garcia-Molina,et al.  D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[104]  Qiang Yang,et al.  A Machine Learning Approach for Instance Matching Based on Similarity Metrics , 2012, SEMWEB.

[105]  Weiru Liu,et al.  A novel ensemble learning approach to unsupervised record linkage , 2017, Inf. Syst..

[106]  Carlos Eduardo Santos Pires,et al.  Exploiting block co-occurrence to control block sizes for entity resolution , 2019, Knowledge and Information Systems.

[107]  Peter Christen,et al.  Data Matching , 2012, Data-Centric Systems and Applications.

[108]  Stéphane Bressan,et al.  Ricochet: A Family of Unconstrained Algorithms for Graph Clustering , 2009, DASFAA.

[109]  Carlo Zaniolo,et al.  Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment , 2016, IJCAI.

[110]  Georgia Koutrika,et al.  Entity resolution with iterative blocking , 2009, SIGMOD Conference.

[111]  Yang Li,et al.  Knowledge Verification for LongTail Verticals , 2017, Proc. VLDB Endow..

[112]  Xiao Chen Crowdsourcing Entity Resolution: a Short Overview and Open Issues , 2015, GvD.

[113]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[114]  Hotham Altwaijry,et al.  QDA: A Query-Driven Approach to Entity Resolution , 2017, IEEE Transactions on Knowledge and Data Engineering.

[115]  Enrico Motta,et al.  Integration of Semantically Annotated Data by the KnoFuss Architecture , 2008, EKAW.

[116]  Rui Zhang,et al.  Entity Alignment between Knowledge Graphs Using Attribute Embeddings , 2019, AAAI.

[117]  Renée J. Miller,et al.  Creating probabilistic databases from duplicated data , 2009, The VLDB Journal.

[118]  Robert Isele,et al.  Learning Expressive Linkage Rules using Genetic Programming , 2012, Proc. VLDB Endow..

[119]  AnHai Doan,et al.  Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services , 2017, SIGMOD Conference.

[120]  George Papastefanatos,et al.  Scaling Entity Resolution to Large, Heterogeneous Data with Enhanced Meta-blocking , 2016, EDBT.

[121]  Hotham Altwaijry,et al.  QuERy: A Framework for Integrating Entity Resolution with Query Processing , 2015, Proc. VLDB Endow..

[122]  Alieh Saeedi,et al.  Using Link Features for Entity Clustering in Knowledge Graphs , 2018, ESWC.

[123]  Data Matching , 2017, Encyclopedia of Machine Learning and Data Mining.

[124]  Zhipeng Gao,et al.  A pruning algorithm for Meta-blocking based on cumulative weight , 2017 .

[125]  Gianluca Demartini,et al.  Large-scale linked data integration using probabilistic reasoning and crowdsourcing , 2013, The VLDB Journal.

[126]  Yuzhong Qu,et al.  Multi-view Knowledge Graph Embedding for Entity Alignment , 2019, IJCAI.

[127]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[128]  Wei Hu,et al.  Bootstrapping Entity Alignment with Knowledge Graph Embedding , 2018, IJCAI.

[129]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[130]  Carlos Eduardo S. Pires,et al.  Schema-agnostic blocking for streaming data , 2020, SAC.

[131]  Sharad Mehrotra,et al.  Parallel Progressive Approach to Entity Resolution Using MapReduce , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[132]  Daniela Rus,et al.  Journal of Graph Algorithms and Applications the Star Clustering Algorithm for Static and Dynamic Information Organization , 2022 .

[133]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[134]  Tim Kraska,et al.  Leveraging transitive relations for crowdsourced joins , 2013, SIGMOD '13.

[135]  Weiyi Meng,et al.  Efficient SPectrAl Neighborhood blocking for entity resolution , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[136]  Jeffrey F. Naughton,et al.  Corleone: hands-off crowdsourcing for entity matching , 2014, SIGMOD Conference.

[137]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[138]  Hector Garcia-Molina,et al.  Attribute-based Crowd Entity Resolution , 2016, CIKM.

[139]  Felix Naumann,et al.  An Introduction to Duplicate Detection , 2010, An Introduction to Duplicate Detection.

[140]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[141]  Divesh Srivastava,et al.  Incremental Record Linkage , 2014, Proc. VLDB Endow..

[142]  Peter J. Haas,et al.  Resolution-Aware Query Answering for Business Intelligence , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[143]  Jianzhong Li,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Dynamic Constraints for Record Matching , 2022 .

[144]  Themis Palpanas,et al.  A Survey of Blocking and Filtering Techniques for Entity Resolution , 2019, ArXiv.

[145]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[146]  George Papadakis,et al.  Multi-core Meta-blocking for Big Linked Data , 2017, SEMANTiCS.

[147]  Xu Chu,et al.  Data Cleaning , 2019, Encyclopedia of Big Data Technologies.

[148]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[149]  Jianjun Cao,et al.  Multi-modal space structure: a new kind of latent correlation for multi-modal entity resolution , 2018, ArXiv.

[150]  Gerhard Weikum,et al.  LINDA: distributed web-of-data-scale entity matching , 2012, CIKM.

[151]  Vasilis Efthymiou,et al.  Big data entity resolution: From highly to somehow similar entity descriptions in the Web , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[152]  Jeffrey Xu Yu,et al.  Entity Matching: How Similar Is Similar , 2011, Proc. VLDB Endow..

[153]  Yongtao Ma,et al.  TYPiMatch: type-specific unsupervised learning of keys and key values for heterogeneous web data integration , 2013, WSDM.

[154]  Josep-Lluís Larriba-Pey,et al.  On the Use of Semantic Blocking Techniques for Data Cleansing and Integration , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[155]  J StolfoSalvatore,et al.  The merge/purge problem for large databases , 1995 .

[156]  Jeff Heflin,et al.  Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach , 2011, SEMWEB.

[157]  Claire Mathieu,et al.  Online Correlation Clustering , 2010, STACS.

[158]  Guoliang Li,et al.  A partial-order-based framework for cost-effective crowdsourced entity resolution , 2018, The VLDB Journal.

[159]  Claudia Niederée,et al.  Probabilistic Entity Linkage for Heterogeneous Information Spaces , 2008, CAiSE.

[160]  Ken Yocum,et al.  Scalable lineage capture for debugging DISC analytics , 2013, SoCC.

[161]  Yannis Papakonstantinou,et al.  Waldo: An Adaptive Human Interface for Crowd Entity Resolution , 2017, SIGMOD Conference.

[162]  Dmitri V. Kalashnikov,et al.  Progressive Approach to Relational Entity Resolution , 2014, Proc. VLDB Endow..

[163]  Divesh Srivastava,et al.  Robust Entity Resolution using Random Graphs , 2018, SIGMOD Conference.

[164]  Shafiq R. Joty,et al.  Distributed Representations of Tuples for Entity Resolution , 2018, Proc. VLDB Endow..

[165]  Zhichun Wang,et al.  Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks , 2018, EMNLP.

[166]  Lei Chen,et al.  CrowdLink: An Error-Tolerant Model for Linking Complex Records , 2015, ExploreDB@SIGMOD/PODS.

[167]  Peter Christen,et al.  Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface , 2008, KDD.

[168]  Claudia Niederée,et al.  A Blocking Framework for Entity Resolution in Highly Heterogeneous Information Spaces , 2013, IEEE Transactions on Knowledge and Data Engineering.

[169]  Anna Jurek-Loughrey,et al.  A Review of Unsupervised and Semi-supervised Blocking Methods for Record Linkage , 2018, Unsupervised and Semi-Supervised Learning.

[170]  L. B. Wilson,et al.  Stable marriage assignment for unequal sets , 1970 .

[171]  Ekaterini Ioannou,et al.  On Generating Benchmark Data for Entity Matching , 2012, Journal on Data Semantics.

[172]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[173]  Sugato Basu,et al.  Adaptive product normalization: using online learning for record linkage in comparison shopping , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[174]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[175]  Felix Naumann,et al.  Detecting duplicate objects in XML documents , 2004, IQIS '04.

[176]  Stephen V. Rice Braided AVL Trees for Efficient Event Sets and Ranked Sets in the SIMSCRIPT III Simulation Programming Language , 2007 .

[177]  Jeffrey F. Naughton,et al.  Tracking Entities in the Dynamic World: A Fast Algorithm for Matching Temporal Records , 2014, Proc. VLDB Endow..

[178]  Gunter Saake,et al.  Cloud-Scale Entity Resolution: Current State and Open Challenges , 2018, Open J. Big Data.

[179]  George Papastefanatos,et al.  Schema-agnostic vs Schema-based Configurations for Blocking Methods on Homogeneous Data , 2015, Proc. VLDB Endow..

[180]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[181]  Jianzhong Li,et al.  Reasoning about Record Matching Rules , 2009, Proc. VLDB Endow..

[182]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[183]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[184]  Qing Wang,et al.  A Clustering-Based Framework to Control Block Sizes for Entity Resolution , 2015, KDD.

[185]  Theodoros Rekatsinas,et al.  Deep Learning for Entity Matching: A Design Space Exploration , 2018, SIGMOD Conference.

[186]  Erhard Rahm,et al.  Frameworks for entity matching: A comparison , 2010, Data Knowl. Eng..

[187]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[188]  Sonia Bergamaschi,et al.  Scaling entity resolution: A loosely schema-aware approach , 2019, Inf. Syst..

[189]  Divesh Srivastava,et al.  Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[190]  Jeffrey F. Naughton,et al.  Modeling entity evolution for temporal record matching , 2014, SIGMOD Conference.

[191]  Aamod Sane,et al.  Fast and accurate incremental entity resolution relative to an entity knowledge base , 2012, CIKM '12.

[192]  Stephen E. Fienberg,et al.  A Comparison of Blocking Methods for Record Linkage , 2014, Privacy in Statistical Databases.

[193]  Rodrigo Gonçalves,et al.  Approximate data instance matching: a survey , 2011, Knowledge and Information Systems.

[194]  Laura M. Haas,et al.  Explaining Data Integration , 2018, IEEE Data Eng. Bull..

[195]  George Papadakis,et al.  Blocking and Filtering Techniques for Entity Resolution , 2019, ACM Comput. Surv..

[196]  Sonia Bergamaschi,et al.  BLAST: a Loosely Schema-aware Meta-blocking Approach for Entity Resolution , 2016, Proc. VLDB Endow..

[197]  Hector Garcia-Molina,et al.  Question Selection for Crowd Entity Resolution , 2013, Proc. VLDB Endow..

[198]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[199]  Daniel P. Miranker,et al.  A DNF Blocking Scheme Learner for Heterogeneous Datasets , 2015, ArXiv.

[200]  Sonia Bergamaschi,et al.  Schema-Agnostic Progressive Entity Resolution , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[201]  Claudia Niederée,et al.  On-the-fly entity-aware query processing in the presence of linkage , 2010, Proc. VLDB Endow..

[202]  M-Dyaa Albakour,et al.  On the Long-Tail Entities in News , 2017, ECIR.

[203]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[204]  Vijaya Krishna Yalavarthi,et al.  A Demonstration of PERC: Probabilistic Entity Resolution With Crowd Errors , 2018, Proc. VLDB Endow..

[205]  Ihab F. Ilyas,et al.  Distributed Data Deduplication , 2016, Proc. VLDB Endow..

[206]  Daniel P. Miranker,et al.  An Unsupervised Algorithm for Learning Blocking Schemes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[207]  Divesh Srivastava,et al.  Record linkage: similarity measures and algorithms , 2006, SIGMOD Conference.

[208]  Markus Nentwig,et al.  A survey of current Link Discovery frameworks , 2016, Semantic Web.

[209]  Gjergji Kasneci,et al.  SIGMa: simple greedy matching for aligning large knowledge bases , 2012, KDD.

[210]  Markus Nentwig,et al.  Holistic Entity Clustering for Linked Data , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[211]  Avigdor Gal Tutorial: Uncertain Entity Resolution , 2014, Proc. VLDB Endow..

[212]  Yeye He,et al.  Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning , 2019, WWW.