论文信息 - Extensible and similarity-based grouping for data integration

Extensible and similarity-based grouping for data integration

The general concept of grouping and aggregation appears to be a fitting paradigm for various issues in data integration, but in its common form of equality-based grouping, a number of problems remain unsolved. We propose a generic approach to user-defined grouping as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms. Furthermore, we discuss high-level language primitives for common applications.

Gunter Saake | Kai-Uwe Sattler | Eike Schallehn

[1] Hamid Pirahesh,et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[2] Jaideep Srivastava,et al. Entity identification in database integration , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[3] C. Lee Giles,et al. CiteSeer: an automatic citation indexing system , 1998, DL '98.

[4] Charles Elkan,et al. An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records , 1997, DMKD.

[5] Kenneth A. Ross,et al. Querying Multiple Features of Groups in Relational Databases , 1996, VLDB.

[6] Norbert Fuhr,et al. Probabilistic Datalog—a logic for powerful retrieval methods , 1995, SIGIR '95.

[7] Gunter Saake,et al. Extensible Grouping and Aggregation for Data Reconciliation , 2001, EFIS.

[8] Kai-Uwe Sattler,et al. A data preparation framework based on a multidatabase language , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[9] Daniela Florescu,et al. AJAX: An Extensible Data Cleaning Tool , 2000, SIGMOD Conference.

[10] T. H. Merrett,et al. Tries for Approximate String Matching , 1996, IEEE Trans. Knowl. Data Eng..

[11] Wen-Syan Li. Knowledge Gathering and Matching in Heterogeneous Databases t , 1995 .

[12] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.

[13] Michael Stonebraker,et al. Independent, Open Enterprise Data Integration , 1999, IEEE Data Eng. Bull..

[14] Diego Calvanese,et al. A Principled Approach to Data Integration and Reconciliation in Data Warehousing , 1999, DMDW.

[15] Sumit Sarkar,et al. A probabilistic relational model and algebra , 1996, TODS.

[16] Gunter Saake,et al. Adding Conflict Resolution Features to a Query Language for Database Federations , 2000, Australas. J. Inf. Syst..

[17] Roger King,et al. Using Object Matching and Materialization to Integrate Heterogeneous Databases , 1999, CoopIS.

[18] Jeremy A. Hylton,et al. Identifying and Merging Related Bibliographic Records , 1996 .

[19] William W. Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[20] Forouzan Golshani,et al. Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[21] Scott B. Huffman,et al. Heuristic Joins to Integrate Structured Hetrogeneous Data , 1995 .

[22] Charles Elkan,et al. The Field Matching Problem: Algorithms and Applications , 1996, KDD.

[23] William Kent,et al. The breakdown of the information model in multi-database systems , 1991, SGMD.

[24] Arbee L. P. Chen,et al. A probabilistic approach to query processing in heterogeneous database systems , 1992, [1992 Proceedings] Second International Workshop on Research Issues on Data Engineering: Transaction and Query Processing.

[25] Goetz Graefe,et al. Query evaluation techniques for large databases , 1993, CSUR.

[26] Wen-Syan Li. Knowledge Gathering and Matching in Heterogeneous Databases , 1995 .

[27] Carlo Zaniolo,et al. Using SQL to Build New Aggregates and Extenders for Object- Relational Systems , 2000, VLDB.

[28] Hector Garcia-Molina,et al. Duplicate Removal in Information Dissemination , 1998 .