Ontology-driven information extraction and integration from heterogeneous distributed autonomous data sources: A federated query centric approach.

Development of high throughput data acquisition in a number of domains (e.g., biological sciences, space sciences, etc.) along with advances in digital storage, computing, and communication technologies have resulted in unprecedented opportunities in scientific discovery, learning, and decision-making. In practice, the effective use of increasing amounts of data from a variety of sources is complicated by the autonomous and distributed nature of the data sources, and the heterogeneity of structure and semantics of the data. In many applications e.g., scientific discovery, it is necessary for users to be able to access, interpret, and analyze data from diverse sources from different perspectives in different contexts. This thesis presents a novel ontology-driven approach which builds on recent advances in artificial intelligence, databases, and distributed computing to support customizable information extraction and integration in such domains. The proposed approach has been realized as part of a prototype implementation of INDUS, an environment for data-driven knowledge acquisition from heterogeneous, distributed, autonomous data sources in Bioinformatics and Computational Biology.

[1]  Craig A. Knoblock,et al.  SIMS: Retrieving and integrating information from multiple sources , 1993, SIGMOD '93.

[2]  Héctor Ariel Leiva,et al.  MRDTL: A multi-relational decision tree learning algorithm , 2002 .

[3]  John F. Sowa,et al.  Knowledge representation: logical, philosophical, and computational foundations , 2000 .

[4]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[5]  Jennifer Widom,et al.  Database System Implementation , 2000 .

[6]  Jennifer Widom,et al.  Integrating and Accessing Heterogeneous Information Sources in TSIMMIS , 1994 .

[7]  Thomas R. Gruber,et al.  A Translation Approach to Portable Ontologies , 1993 .

[8]  Jeffrey D. Ullman,et al.  Capability based mediation in TSIMMIS , 1998, SIGMOD '98.

[9]  Alon Y. Halevy,et al.  Recursive Query Plans for Data Integration , 2000, J. Log. Program..

[10]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[11]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[12]  Matthias Lange,et al.  Logical and semantic database integration , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[13]  Chen Li,et al.  Query processing and optimization in information-integration systems , 2001 .

[14]  Vasant Honavar,et al.  Analysis and Synthesis of Agents That Learn from Distributed Dynamic Data Sources , 2001, Emergent Neural Computational Architectures Based on Neuroscience.

[15]  Monica Riley,et al.  EcoCyc : Encyclopedia of E . coli Genes and , 1999 .

[16]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[17]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[18]  Eduardo Mena,et al.  An Ontology connected to several data repositories: query processing steps , 1998 .

[19]  Nick Roussopoulos,et al.  Automatic Deployment of Application-Specific Metadata and Code in MOCHA , 2000, EDBT.

[20]  Merwyn G. Taylor,et al.  Parka-db: Integrating Knowledge-and Data-based Technologies , 2022 .

[21]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[22]  Christian Convey,et al.  Data Integration Services , 2001 .

[23]  Alon Y. Levy Logic-based techniques in data integration , 2001 .

[24]  Ali R. Hurson,et al.  Multidatabase Systems: An Advance Solution for Global Information Sharing , 1993 .

[25]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[26]  Joachim Biskup,et al.  A foundation of CODD's relational maybe-operations , 1983, TODS.

[27]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[28]  Hector Garcia-Molina,et al.  Template-based wrappers in the TSIMMIS system , 1997, SIGMOD '97.

[29]  Craig A. Knoblock,et al.  Retrieving and Integrating Data from Multiple Information Sources , 1993, Int. J. Cooperative Inf. Syst..

[30]  Jennifer Widom,et al.  Querying Semistructured Heterogeneous Information , 1995, J. Syst. Integr..

[31]  Jeffrey D. Ullman,et al.  A Query Translation Scheme for Rapid Implementation of Wrappers , 1995, DOOD.

[32]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[33]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[34]  Vasant Honavar,et al.  Ontology-Driven Induction of Decision Trees at Multiple Levels of Abstraction , 2002, SARA.

[35]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[36]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[37]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[38]  Yannis Papakonstantinou,et al.  Describing and Using Query Capabilities of Heterogeneous Sources , 1997, VLDB.

[39]  Nick Roussopoulos,et al.  MOCHA: a self-extensible database middleware system for distributed data sources , 2000, SIGMOD '00.

[40]  Jennifer Widom,et al.  Flexible Constraint Management for Autonomous Distributed Databases , 1994, IEEE Data Eng. Bull..

[41]  James A. Hendler,et al.  Efficient Management of Very Large Ontologies , 1997, AAAI/IAAI.

[42]  Alon Y. Halevy,et al.  The Nimble XML data integration system , 2001, Proceedings 17th International Conference on Data Engineering.

[43]  Anand Rajaraman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS.

[44]  Jennifer Widom,et al.  Information translation, mediation, and mosaic-based browsing in the TSIMMIS system , 1995, SIGMOD '95.

[45]  Carole A. Goble,et al.  TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources , 1998, ISMB.

[46]  Vasant Honavar,et al.  Ontology-Driven Information Extraction and Knowledge Acquisition from Heterogeneous, Distributed, Autonomous Biological Data Sources , 2001 .

[47]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[48]  Divesh Srivastava,et al.  The Information Manifold , 1995 .

[49]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[50]  Carole A. Goble,et al.  An ontology for bioinformatics applications , 1999, Bioinform..

[51]  Ian Horrocks,et al.  How to Decide Query Containment Under Constraints Using a Description Logic , 2000, LPAR.

[52]  Ali R. Hurson,et al.  A taxonomy and current issues in multidatabase systems , 1992, Computer.

[53]  Michael R. Genesereth,et al.  Answering recursive queries using views , 1997, PODS '97.

[54]  Laura M. Haas,et al.  Transforming Heterogeneous Data with Database Middleware: Beyond Integration , 1999, IEEE Data Eng. Bull..

[55]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[56]  Subbarao Kambhampati,et al.  Planning for Information Gathering: A Tutorial Survey , 1997 .

[57]  Bertram Ludäscher,et al.  Navigation-Driven Evaluation of Virtual Mediated Views , 2000, EDBT.

[58]  Craig A. Knoblock,et al.  Query reformulation for dynamic information integration , 1996, Journal of Intelligent Information Systems.

[59]  Moonis Ali,et al.  Knowledge-directed induction in a DB environment , 1990, IEA/AIE '90.

[60]  Val Tannen,et al.  K2/Kleisli and GUS: Experiments in integrated access to genomic data sources , 2001, IBM Syst. J..

[61]  Kyuseok Shim,et al.  Optimizing queries with materialized views , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[62]  Carole A. Goble,et al.  Query processing in the TAMBIS bioinformatics source integration system , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[63]  Vipul Kashyap,et al.  OBSERVER: An Approach for Query Processing in Global Information Systems Based on Interoperation Across Pre-Existing Ontologies , 2000, Distributed and Parallel Databases.

[64]  Diego Calvanese,et al.  Information integration: conceptual modeling and reasoning support , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[65]  Andrea Calì,et al.  Accessing Data Integration Systems through Conceptual Schemas , 2001, ER.