Data Cleaning: Problems and Current Approaches
暂无分享,去创建一个
Erhard Rahm | Hong Hai Do | H. Do | E. Rahm
[1] Krishna Bharat,et al. Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.
[2] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..
[3] Oren Etzioni,et al. Web document clustering: a feasibility demonstration , 1998, SIGIR '98.
[4] Jeremy A. Hylton,et al. Identifying and Merging Related Bibliographic Records , 1996 .
[5] Matthias Jarke,et al. A Model for Data Warehouse Operational Processes , 2000, CAiSE.
[6] Dennis Shasha,et al. AJAX: an extensible data cleaning tool , 2000, SIGMOD '00.
[7] José Oncina,et al. Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.
[8] William W. Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.
[9] Chris Clifton,et al. SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks , 2000, Data Knowl. Eng..
[10] Arturo Crespo,et al. A Survey Of Semi-Automatic Extraction And Transformation , 1994 .
[11] Tova Milo,et al. Using Schema Matching to Simplify Heterogeneous Data Translation , 1998, VLDB.
[12] Umeshwar Dayal,et al. An Overview of Repository Technology , 1994, VLDB.
[13] Panos Vassiliadis,et al. Gulliver in the land of data warehousing: practical experiences and observations of a researcher , 2000, DMDW.
[14] Fran eDaniela. Flores,et al. De laratively leaning your data using AJAX , 2000 .
[15] Andrei Z. Broder,et al. Mirror, Mirror on the Web: A Study of Host Pairs with Replicated Content , 1999, Comput. Networks.
[16] Michael Stonebraker,et al. Open enterprise data integration , 1999 .
[17] Howard B. Newcombe,et al. Handbook of record linkage: methods for health and statistical studies, administration, and business , 1988 .
[18] Matthias Jarke,et al. Fundamentals of Data Warehouses , 2000, Springer Berlin Heidelberg.
[19] Gio Wiederhold,et al. Mediators in the architecture of future information systems , 1992, Computer.
[20] Donald E. Knuth,et al. Fast Pattern Matching in Strings , 1977, SIAM J. Comput..
[21] Serge Abiteboul,et al. Tools for Data Translation and Integration , 1999, IEEE Data Eng. Bull..
[22] Charles Elkan,et al. An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records , 1997, DMKD.
[23] Patrick A. V. Hall,et al. Approximate String Matching , 1994, Encyclopedia of Algorithms.
[24] K. Minton. Extraction Patterns for Information Extraction Tasks : A Survey , 1999 .
[25] Hongjun Lu,et al. Cleansing Data for Mining and Warehousing , 1999, DEXA.
[26] Erhard Rahm,et al. On Metadata Interoperability in Data Warehouses , 2000 .
[27] Brad Adelberg,et al. NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents , 1998, SIGMOD '98.
[28] Stuart E. Madnick,et al. Inter-database instance identification in composite information systems , 1989, [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume III: Decision Support and Knowledge Based Systems Track.
[29] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[30] H B NEWCOMBE,et al. Automatic linkage of vital records. , 1959, Science.
[31] Vipul Kashyap,et al. Semantic and schematic similarities between database objects: a context-based approach , 1996, The VLDB Journal.
[32] Oren Etzioni,et al. A Grammar Inference Algorithm for the World Wide Web , 2002 .
[33] Surajit Chaudhuri,et al. An overview of data warehousing and OLAP technology , 1997, SGMD.
[34] Robert S. Boyer,et al. A fast string searching algorithm , 1977, CACM.
[35] Ramakrishnan Srikant,et al. Mining generalized association rules , 1995, Future Gener. Comput. Syst..
[36] Michael Stonebraker,et al. Database research: achievements and opportunities into the 1st century , 1996, SGMD.
[37] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.
[38] Craig A. Knoblock,et al. A hierarchical approach to wrapper induction , 1999, AGENTS '99.
[39] Joseph M. Hellerstein,et al. Potter''s Wheel: An Interactive Framework for Data Transformation and Cleaning , 2001, VLDB 2001.
[40] Jeffrey D. Ullman,et al. Set Merging Algorithms , 1973, SIAM J. Comput..
[41] Joseph M. Hellerstein,et al. Potters Wheel: An interactive framework for data cleaning , 2000 .
[42] Diego Calvanese,et al. Information integration: conceptual modeling and reasoning support , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).
[43] M. W. Du,et al. An Approach to Designing Very Fast Approximate String Matching Algorithms , 1994, IEEE Trans. Knowl. Data Eng..
[44] Peter Willett,et al. Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..
[45] Pedro M. Domingos,et al. Learning Source Description for Data Integration , 2000, WebDB.
[46] Vincent Kanade,et al. Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.
[47] Elke A. Rundensteiner. Letter from the Special Issue Editor , 1999, IEEE Data Eng. Bull..
[48] Nicholas Kushmerick,et al. Wrapper induction: Efficiency and expressiveness , 2000, Artif. Intell..
[49] Stefano Spaccapietra,et al. Issues and approaches of database integration , 1998, CACM.
[50] Usama M. Fayyad,et al. Mining Databases: Towards Algorithms for Knowledge Discovery , 1998, IEEE Data Eng. Bull..
[51] Hector Garcia-Molina,et al. Finding replicated Web collections , 2000, SIGMOD '00.
[52] Chun-Nan Hsu,et al. Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web , 1998, Inf. Syst..
[53] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[54] Andrei Z. Broder,et al. A Comparison of Techniques to Find Mirrored Hosts on the WWW , 2000, IEEE Data Eng. Bull..
[55] Nicholas Kushmerick,et al. Regression testing for wrapper maintenance , 1999, AAAI/IAAI.
[56] A. A. Brooks,et al. Experiment in computer-assisted duplicate checking , 1976 .
[57] Charles Elkan,et al. The Field Matching Problem: Algorithms and Applications , 1996, KDD.
[58] Veda C. Storey,et al. A Framework for Analysis of Data Quality Research , 1995, IEEE Trans. Knowl. Data Eng..
[59] Alvaro E. Monge,et al. Adaptive detection of approximately duplicate database records and the database integration approach to information discovery , 1998 .
[60] Jordan Lampe,et al. Theoretical and Empirical Comparisons of Approximate String Matching Algorithms , 1992, CPM.
[61] Raymond J. Mooney,et al. Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.
[62] William W. Cohen. Recognizing Structure in Web Pages using Similarity Queries , 1999, AAAI/IAAI.
[63] Dennis Shasha,et al. An extensible Framework for Data Cleaning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).
[64] Ted E. Senator,et al. The Financial Crimes Enforcement Network AI System (FAIS) Identifying Potential Money Laundering from Reports of Large Cash Transactions , 1995, AI Mag..
[65] Dayne Freitag,et al. Boosted Wrapper Induction , 2000, AAAI/IAAI.
[66] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .
[67] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.
[68] Hector Garcia-Molina,et al. Finding near-replicas of documents on the Web , 1999 .
[69] Raffaele Giancarlo,et al. Data structures and algorithms for approximate string matching , 1988, J. Complex..
[70] Philip A. Bernstein,et al. Meta-Data Support for Data Transformations Using Microsoft Repository , 1999, IEEE Data Eng. Bull..
[71] Mauricio Antonio Hernandez-Sherrington. A generalization of band joins and the merge/purge problem , 1996 .
[72] Shamkant B. Navathe,et al. An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.
[73] Laks V. S. Lakshmanan,et al. SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems , 1996, VLDB.
[74] C. Sapia,et al. On Supporting the Data Warehouse Design by Data Mining Techniques , 1999 .
[75] Mary Roth,et al. Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources , 1997, VLDB.
[76] Dennis Shasha,et al. Declaratively Cleaning your Data with AJAX , 2000, BDA.
[77] Laura M. Haas,et al. Transforming Heterogeneous Data with Database Middleware: Beyond Integration , 1999, IEEE Data Eng. Bull..
[78] Raymond J. Mooney,et al. Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.
[79] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.
[80] Jon M. Kleinberg,et al. Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.
[81] Kristina Lerman,et al. Learning the Common Structure of Data , 2000, AAAI/IAAI.
[82] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[83] Michael Stonebraker,et al. Independent, Open Enterprise Data Integration , 1999, IEEE Data Eng. Bull..
[84] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.
[85] Nicholas Kushmerick,et al. Wrapper Induction for Information Extraction , 1997, IJCAI.
[86] Edward T. O'Neill,et al. A Methodology for Sampling the World Wide Web , 2001 .
[87] Matthias Jarke,et al. Data Warehouse Refreshment , 2000 .
[88] Richard Y. Wang,et al. Toward quality data: An attribute-based approach , 2014, Decis. Support Syst..
[89] James L. Peterson,et al. Computer programs for detecting and correcting spelling errors , 1980, CACM.