Index-Driven XML Data Integration to Support Functional Genomics

We identify a new type of data integration problem that arises in functional genomics research in the context of large-scale experiments involving arrays, 2-dimensional protein gels and mass-spectrometry. We explore the current practice of data analysis that involves repeated web queries iterating over long lists of gene or protein names. We postulate a new approach to solve this problem, applicable to data sets stored in XML format. We propose to discover data redundancies using an XML index we construct and to remove them from the results returned by the query. We combine XML indexing with queries carried out on top of relational tables. We believe our approach could support semi-automated data integration such as that required in the interpretation of large-scale biological experiments.

[1]  Alexandra Poulovassilis,et al.  Using AutoMed metadata in data warehousing environments , 2003, DOLAP '03.

[2]  Oren Etzioni,et al.  Crossing the Structure Chasm , 2003, CIDR.

[3]  Alexandra Poulovassilis,et al.  A Semantic Approach to Integrating XML and Structured Data Sources , 2001, CAiSE.

[4]  L Hunter,et al.  MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. , 1999, BioTechniques.

[5]  Emmanuel Barillot,et al.  XML, bioinformatics and data integration , 2001, Bioinform..

[6]  Andrew Jones,et al.  Proposal for a Standard Representation of Two-Dimensional Gel Electrophoresis Data , 2003, Comparative and functional genomics.

[7]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[8]  Tsviya Olender,et al.  Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE , 2003, Nucleic Acids Res..

[9]  Alexandra Poulovassilis,et al.  Schema Evolution in Heterogeneous Database Architectures, A Schema Transformation Approach , 2002, CAiSE.

[10]  Lukasz A. Kurgan,et al.  Semantic Mapping of XML Tags Using Inductive Machine Learning , 2002, ICMLA.

[11]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[12]  Cecil Eng Huang Chua,et al.  Instance-based attribute identification in database integration , 2003, The VLDB Journal.

[13]  Pedro M. Domingos,et al.  Learning to match ontologies on the Semantic Web , 2003, The VLDB Journal.

[14]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[15]  Pedro M. Domingos,et al.  Representing and reasoning about mappings between domain models , 2002, AAAI/IAAI.

[16]  Laura M. Haas,et al.  Data-driven understanding and refinement of schema mappings , 2001, SIGMOD '01.

[17]  Erhard Rahm,et al.  Comparison of Schema Matching Evaluations , 2002, Web, Web-Services, and Database Systems.