Autoplex: Automated Discovery of Content for Virtual Databases

Most virtual database systems are suitable for environments in which the set of member information sources is small and stable. Consequently, present virtual database systems do not scale up very well. The main reason is the complexity and cost of incorporating new information sources into the virtual database. In this paper we describe a system, called Autoplex, which uses machine learning techniques for automating the discovery of new content for virtual database systems. Autoplex assumes that several information sources have already been incorporated ("mapped") into the virtual database system by human experts (as done in standard virtual database systems). Autoplex learns the features of these examples. It then applies this knowledge to new candidate sources, trying to infer views that "resemble" the examples. In this paper we report initial results from the Autoplex project.

[1]  Marti A. Hearst Trends & Controversies: Information integration , 1998, IEEE Intell. Syst..

[2]  Shamkant B. Navathe,et al.  Restructuring for large databases: three levels of abstraction , 1975, TODS.

[3]  Witold Litwin MALPHA: A relatiohal multidatabase manipulation language , 1984, 1984 IEEE First International Conference on Data Engineering.

[4]  Dennis McLeod,et al.  A federated architecture for information management , 1985, TOIS.

[5]  Amihai Motro,et al.  Multiplex: A Formal Model for Multidatabases and Its Implementation , 1999, NGITS.

[6]  Elke A. Rundensteiner,et al.  Maintaining data warehouses over changing information sources , 2000, CACM.

[7]  Chris Clifton,et al.  Semantic Integration in Heterogeneous Databases Using Neural Networks , 1994, VLDB.

[8]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[9]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[10]  James P. Fry,et al.  A data description language approach to file translation , 1974, SIGFIDET '74.

[11]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[13]  Weimin Du,et al.  The Pegasus heterogeneous multidatabase system , 1991, Computer.

[14]  ZVI GALIL,et al.  Efficient algorithms for finding maximum matching in graphs , 1986, CSUR.

[15]  P. Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[16]  Tova Milo,et al.  Using Schema Matching to Simplify Heterogeneous Data Translation , 1998, VLDB.

[17]  Michael Lesk How Can We Get High-Quality Electronic Journals? , 1998 .

[18]  Serge Abiteboul,et al.  Correspondence and translation for heterogeneous data , 1997, Theor. Comput. Sci..

[19]  Umeshwar Dayal,et al.  View Definition and Generalization for Database Integration in a Multidatabase System , 1984, IEEE Transactions on Software Engineering.

[20]  Noah S. Prywes,et al.  “Automatic generation of data conversion programs using a data description language” , 1974, SIGFIDET '74.

[21]  Clement T. Yu,et al.  Report on the workshop on heterogenous database systems held at Northwestern University Evanston, Illinois, December 11-13, 1989 sponsored by NSF , 1990, SGMD.

[22]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[23]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.