Trends in Cleaning Relational Data: Consistency and Deduplication
暂无分享,去创建一个
Ihab F. Ilyas | Xu Chu | Xu Chu | I. Ilyas
[1] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .
[2] W. A. Beyer,et al. Some Biological Sequence Metrics , 1976 .
[3] William E. Winkler,et al. String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. , 1990 .
[4] Gösta Grahne,et al. The Problem of Incomplete Information in Relational Databases , 1991, Lecture Notes in Computer Science.
[5] Serge Abiteboul,et al. Foundations of Databases , 1994 .
[6] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.
[7] Charles Elkan,et al. The Field Matching Problem: Algorithms and Applications , 1996, KDD.
[8] William W. Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.
[9] Hannu Toivonen,et al. TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..
[10] William E. Winkler,et al. The State of Record Linkage and Current Research Problems , 1999 .
[11] Jan Chomicki,et al. Consistent query answers in inconsistent databases , 1999, PODS '99.
[12] Greg Schohn,et al. Less is More: Active Learning with Support Vector Machines , 2000, ICML.
[13] Ahmed K. Elmagarmid,et al. Automating the approximate record-matching process , 2000, Inf. Sci..
[14] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.
[15] Erhard Rahm,et al. Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..
[16] Joseph M. Hellerstein,et al. Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.
[17] Dennis Shasha,et al. Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.
[18] Edward L. Robertson,et al. FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.
[19] Hwee Tou Ng,et al. A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.
[20] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..
[21] Craig A. Knoblock,et al. Learning object identification rules for information integration , 2001, Inf. Syst..
[22] Claire Gardent,et al. Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.
[23] Anuradha Bhamidipaty,et al. Interactive deduplication using active learning , 2002, KDD.
[24] Surajit Chaudhuri,et al. Eliminating Fuzzy Duplicates in Data Warehouses , 2002, VLDB.
[25] Jiawei Han,et al. Profile-Based Object Matching for Information Integration , 2003, IEEE Intell. Syst..
[26] Raymond J. Mooney,et al. Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.
[27] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[28] Andrew McCallum,et al. Conditional Models of Identity Uncertainty with Application to Noun Coreference , 2004, NIPS.
[29] Salvatore J. Stolfo,et al. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.
[30] M. Charikar,et al. Aggregating inconsistent information: ranking and clustering , 2005, STOC '05.
[31] Rajeev Rastogi,et al. A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.
[32] Jan Chomicki,et al. Minimal-change integrity maintenance using tuple deletions , 2002, Inf. Comput..
[33] Rajeev Motwani,et al. Robust identification of fuzzy duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).
[34] Laura M. Haas,et al. Clio grows up: from research prototype to industrial tool , 2005, SIGMOD '05.
[35] Raymond J. Mooney,et al. Adaptive Blocking: Learning to Scale Up Record Linkage , 2006, Sixth International Conference on Data Mining (ICDM'06).
[36] Craig A. Knoblock,et al. Learning Blocking Schemes for Record Linkage , 2006, AAAI.
[37] Pedro M. Domingos,et al. Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).
[38] Divesh Srivastava,et al. Record linkage: similarity measures and algorithms , 2006, SIGMOD Conference.
[39] A. Karr. Exploratory Data Mining and Data Cleaning , 2006 .
[40] Leopoldo E. Bertossi,et al. Complexity of Consistent Query Answering in Databases Under Cardinality-Based and Incremental Repair Semantics , 2006, ICDT.
[41] Ahmed K. Elmagarmid,et al. Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.
[42] R. Stockdale,et al. Data Quality Information and Decision Making: A Healthcare Case Study , 2007 .
[43] Shuai Ma,et al. Improving Data Quality: Consistency and Accuracy , 2007, VLDB.
[44] Wenfei Fan,et al. Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.
[45] Lise Getoor,et al. Collective entity resolution in relational data , 2007, TKDD.
[46] Surajit Chaudhuri,et al. Example-driven design of efficient record matching queries , 2007, VLDB.
[47] Jean-Marc Petit,et al. Unary and n-ary inclusion dependency discovery in relational databases , 2009, Journal of Intelligent Information Systems.
[48] William E. Winkler,et al. Data quality and record linkage techniques , 2007 .
[49] Felix Naumann,et al. Industry-scale duplicate detection , 2008, Proc. VLDB Endow..
[50] Micha Elsner,et al. You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement , 2008, ACL.
[51] Joseph M. Hellerstein,et al. Quantitative Data Cleaning for Large Databases , 2008 .
[52] Renée J. Miller,et al. Discovering data quality rules , 2008, Proc. VLDB Endow..
[53] Bei Yu,et al. On generating near-optimal tableaux for conditional functional dependencies , 2008, Proc. VLDB Endow..
[54] Phokion G. Kolaitis,et al. Repair checking in inconsistent databases: algorithms and complexity , 2009, ICDT '09.
[55] Avishek Saha,et al. Metric Functional Dependencies , 2009, 2009 IEEE 25th International Conference on Data Engineering.
[56] Felix Naumann,et al. Data Fusion – Resolving Data Conflicts for Integration , 2009 .
[57] Laks V. S. Lakshmanan,et al. On approximating optimum repairs for functional dependency violations , 2009, ICDT '09.
[58] Jianzhong Li,et al. Reasoning about Record Matching Rules , 2009, Proc. VLDB Endow..
[59] M. Elsner,et al. Bounding and Comparing Methods for Correlation Clustering Beyond ILP , 2009, ILP 2009.
[60] Felix Naumann,et al. Data fusion , 2009, CSUR.
[61] Burr Settles,et al. Active Learning Literature Survey , 2009 .
[62] Christopher Ré,et al. Large-Scale Deduplication with Constraints Using Dedupalog , 2009, 2009 IEEE 25th International Conference on Data Engineering.
[63] Lei Chen,et al. Discovering matching dependencies , 2009, CIKM.
[64] Raghav Kaushik,et al. On active learning of record matching packages , 2010, SIGMOD Conference.
[65] Robert L. Surowka. Modeling and querying possible repairs in duplicate detection , 2010 .
[66] Lukasz Golab,et al. Sampling the repairs of functional dependency violations under hard constraints , 2010, Proc. VLDB Endow..
[67] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[68] Felix Naumann,et al. An Introduction to Duplicate Detection , 2010, An Introduction to Duplicate Detection.
[69] Hector Garcia-Molina,et al. Entity resolution with evolving rules , 2010, Proc. VLDB Endow..
[70] Shuai Ma,et al. Detecting inconsistencies in distributed data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).
[71] Chen Li,et al. Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.
[72] Leopoldo E. Bertossi,et al. Database Repairing and Consistent Query Answering , 2011, Database Repairing and Consistent Query Answering.
[73] Floris Geerts,et al. Discovering Conditional Functional Dependencies , 2011, IEEE Transactions on Knowledge and Data Engineering.
[74] Paul Zikopoulos,et al. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .
[75] Suman Nath,et al. Tracing data errors with view-conditioned causality , 2011, SIGMOD '11.
[76] Jeffrey Heer,et al. Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.
[77] Emanuel Santos,et al. Support for User Involvement in Data Cleaning , 2011, DaWaK.
[78] Ahmed K. Elmagarmid,et al. Guided data repair , 2011, Proc. VLDB Endow..
[79] Jianzhong Li,et al. Towards certain fixes with editing rules and master data , 2010, The VLDB Journal.
[80] Renée J. Miller,et al. A unified model for data and constraint repair , 2011, 2011 IEEE 27th International Conference on Data Engineering.
[81] Jeffrey Heer,et al. Proactive wrangling: mixed-initiative end-user programming of data transformation scripts , 2011, UIST.
[82] Jean-Marc Petit,et al. Discovering Editing Rules For Data Cleaning , 2012, VLDB 2012.
[83] Sergio Greco,et al. Incomplete Data and Data Dependencies in Relational Databases , 2012, Incomplete Data and Data Dependencies in Relational Databases.
[84] Andreas Thor,et al. Dedoop: Efficient Deduplication with Hadoop , 2012, Proc. VLDB Endow..
[85] Ashwin Machanavajjhala,et al. Entity Resolution: Theory, Practice & Open Challenges , 2012, Proc. VLDB Endow..
[86] Andreas Thor,et al. Load Balancing for MapReduce-based Entity Resolution , 2011, 2012 IEEE 28th International Conference on Data Engineering.
[87] Wenfei Fan,et al. Foundations of Data Quality Management , 2012, Foundations of Data Quality Management.
[88] Tim Kraska,et al. CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..
[89] Ashwin Machanavajjhala,et al. An automatic blocking mechanism for large-scale de-duplication tasks , 2012, CIKM '12.
[90] Michael Stonebraker,et al. A Demonstration of DBWipes: Clean as You Query , 2012, Proc. VLDB Endow..
[91] Tim Kraska,et al. Leveraging transitive relations for crowdsourced joins , 2013, SIGMOD '13.
[92] Ahmed Eldawy,et al. NADEEF: a commodity data cleaning system , 2013, SIGMOD '13.
[93] Lukasz Golab,et al. Sampling from repairs of conditional functional dependency violations , 2014, The VLDB Journal.
[94] Wenfei Fan,et al. Inferring data currency and consistency for conflict resolution , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[95] Samuel Madden,et al. Scorpion: Explaining Away Outliers in Aggregate Queries , 2013, Proc. VLDB Endow..
[96] Paolo Papotti,et al. The LLUNATIC Data-Cleaning Framework , 2013, Proc. VLDB Endow..
[97] Lukasz Golab,et al. On the relative trust between inconsistent data and inaccurate constraints , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[98] Paolo Papotti,et al. Holistic data cleaning: Putting violations into context , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[99] Paolo Papotti,et al. Discovering Denial Constraints , 2013, Proc. VLDB Endow..
[100] Michael Stonebraker,et al. Data Curation at Scale: The Data Tamer System , 2013, CIDR.
[101] Paolo Papotti,et al. That's All Folks! LLUNATIC Goes Open Source , 2014, Proc. VLDB Endow..
[102] Wenfei Fan,et al. Detecting Errors in Numeric Attributes , 2014, WAIM.
[103] Yeye He,et al. ClusterJoin: A Similarity Joins Framework using Map-Reduce , 2014, Proc. VLDB Endow..
[104] Jeffrey F. Naughton,et al. Corleone: hands-off crowdsourcing for entity matching , 2014, SIGMOD Conference.
[105] Renée J. Miller,et al. Continuous data cleaning , 2014, 2014 IEEE 30th International Conference on Data Engineering.
[106] Paolo Papotti,et al. RuleMiner: Data quality rules discovery , 2014, 2014 IEEE 30th International Conference on Data Engineering.
[107] Nilesh N. Dalvi,et al. Crowdsourcing Algorithms for Entity Resolution , 2014, Proc. VLDB Endow..
[108] Shuai Ma,et al. Extending inclusion dependencies with conditions , 2014, Theor. Comput. Sci..
[109] Paolo Papotti,et al. Mapping and cleaning , 2014, 2014 IEEE 30th International Conference on Data Engineering.
[110] Nan Tang,et al. Towards dependable data repairing with fixing rules , 2014, SIGMOD Conference.
[111] Paolo Papotti,et al. Descriptive and prescriptive data cleaning , 2014, SIGMOD Conference.
[112] Shuai Ma,et al. Interaction between Record Matching and Data Repairing , 2014, JDIQ.
[113] Wenfei Fan,et al. Conflict resolution with data currency and consistency , 2014, ACM J. Data Inf. Qual..
[114] Jianzhong Li,et al. Incremental Detection of Inconsistencies in Distributed Data , 2014, IEEE Trans. Knowl. Data Eng..
[115] Divesh Srivastava,et al. Incremental Record Linkage , 2014, Proc. VLDB Endow..
[116] Tim Kraska,et al. A sample-and-clean framework for fast and accurate query processing on dirty data , 2014, SIGMOD Conference.
[117] Paolo Papotti,et al. BigDansing: A System for Big Data Cleansing , 2015, SIGMOD Conference.
[118] Paolo Papotti,et al. KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing , 2015, SIGMOD Conference.
[119] Jeffrey Heer,et al. Predictive Interaction for Data Transformation , 2015, CIDR.
[120] Felix Naumann,et al. Divide & Conquer-based Inclusion Dependency Discovery , 2015, Proc. VLDB Endow..
[121] Nan Tang,et al. Proof positive and negative in data cleaning , 2015, 2015 IEEE 31st International Conference on Data Engineering.