SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks

One step in interoperating among heterogeneous databases is semantic integration: Identifying relationships between attributes or classes in diAerent database schemas. SEMantic INTegrator (SEMINT) is a tool based on neural networks to assist in identifying attribute correspondences in heterogeneous databases. SEMINT supports access to a variety of database systems and utilizes both schema information and data contents to produce rules for matching corresponding attributes automatically. This paper provides theoretical background and implementation details of SEMINT. Experimental results from large and complex real databases are presented. We discuss the eAectiveness of SEMINT and our experiences with attribute correspondence identification in various environments. ” 2000 Elsevier Science B.V. All rights reserved.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  G. A. Miller,et al.  WordNet : a lexical database for English : New horizons in commercial and industrial AI , 1995 .

[3]  Stefano Spaccapietra,et al.  Issues and approaches of database integration , 1998, CACM.

[4]  Dennis McLeod,et al.  On using historical update information for instance identification in federated databases , 1996, Proceedings First IFCIS International Conference on Cooperative Information Systems.

[5]  William J. Premerlani,et al.  An approach for reverse engineering of relational databases , 1993, [1993] Proceedings Working Conference on Reverse Engineering.

[6]  Stephen Hayne,et al.  Multi-user view integration system (MUVIS): an expert system for view integration , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[7]  James A. Larson,et al.  Integrating User Views in Database Design , 1986, Computer.

[8]  T. Kohonen Adaptive, associative, and self-organizing functions in neural computing. , 1987, Applied optics.

[9]  Roger King,et al.  Report of the Workshop on Semantic Heterogeneity and Interpolation in multidatabase Systems , 1993, SGMD.

[10]  Arie Segev,et al.  A universal relation approach to federated database management , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[11]  Gio Wiederhold,et al.  Digital libraries, value, and productivity , 1995, CACM.

[12]  Dennis McLeod,et al.  A federated architecture for database systems , 1899, AFIPS '80.

[13]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[14]  Hsinchun Chen,et al.  Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic Algorithms , 1995, J. Am. Soc. Inf. Sci..

[15]  James A. Larson,et al.  A Theory of Attribute Equivalence in Databases with Application to Schema Integration , 1989, IEEE Trans. Software Eng..

[16]  Wen-Syan Li,et al.  Constructing information systems based on schema reuse , 1996, CIKM '96.

[17]  Amit P. Sheth,et al.  Attribute Relationships: An Impediment in Automating Schema Integration , 1989 .

[18]  Clement T. Yu,et al.  Determining relationships among attributes for interoperability of multi-database systems , 1991, [1991] Proceedings. First International Workshop on Interoperability in Multidatabase Systems.

[19]  Arnon Rosenthal,et al.  Using semantic values to facilitate interoperability among heterogeneous information systems , 1994, TODS.

[20]  Bernard Widrow,et al.  The basic ideas in neural networks , 1994, CACM.

[21]  Eugene Wong,et al.  Multibase: integrating heterogeneous distributed database systems , 1981, AFIPS '81.

[22]  Chris Clifton,et al.  Semint: a system prototype for semantic integration in heterogeneous databases , 1995, SIGMOD '95.

[23]  James A. Larson,et al.  A tool for integrating conceptual schemas and user views , 1988, Proceedings. Fourth International Conference on Data Engineering.

[24]  Arnon Rosenthal,et al.  Data Integration in the Large: The Challenge of Reuse , 1994, VLDB.

[25]  Tamás D. Gedeon,et al.  Concept Clustering for Cooperation in Heterogeneous Information Systems , 1995 .

[26]  Chris Clifton,et al.  Semantic Integration in Heterogeneous Databases Using Neural Networks , 1994, VLDB.

[27]  Chris Clifton,et al.  Using field specifications to determine attribute equivalence in heterogeneous databases , 1993, Proceedings RIDE-IMS `93: Third International Workshop on Research Issues in Data Engineering: Interoperability in Multidatabase Systems.

[28]  Geoffrey E. Hinton,et al.  A general framework for parallel distributed processing , 1986 .

[29]  Christine Collet,et al.  Resource integration using a large knowledge base in Carnot , 1991, Computer.

[30]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[31]  Douglas H. Fisher,et al.  An Empirical Comparison of ID3 and Back-propagation , 1989, IJCAI.

[32]  Chris Clifton,et al.  Classifying software components using design characteristics , 1995, Proceedings 1995 10th Knowledge-Based Software Engineering Conference.

[33]  Peter Buneman,et al.  Constructing superviews , 1981, SIGMOD '81.

[34]  Bernard Widrow,et al.  Neural networks: applications in industry, business and science , 1994, CACM.

[35]  Vipul Kashyap,et al.  So Far (Schematically) yet So Near (Semantically) , 1992, DS-5.

[36]  Kaizheng Du,et al.  On Estimating COUNT, SUM, and AVERAGE Relational Algebra Queries , 1991 .

[37]  A. Zeroual,et al.  MSQL: A Multidatabase Language , 1989, Inf. Sci..

[38]  Gio Wiederhold,et al.  Intelligent integration of information , 1993, SIGMOD Conference.