Topology Search over Biological Databases

We introduce the notion of a data topology and the problem of topology search over databases. A data topology summarizes the set of all possible relationships that connect a given set of entities. Topology search enables users to search for data topologies that relate entities in a large database, and to effectively summarize and rank these relationships. Using topology search over a biological database, users can ask, for example, how transcription factor proteins are related to DNAs in humans. However, detecting topologies in large databases is a difficult problem because entities can be connected in multiple ways. In this paper, we formalize the notion of data topologies, develop efficient algorithms for computing data topologies based on user queries, and evaluate our algorithms using a real biological database, the Biozon database (www.biozon.org).

[1]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[2]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[3]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[4]  Vagelis Hristidis,et al.  Authority-based keyword search in databases , 2008, TODS.

[5]  Limsoon Wong,et al.  The Kleisli Query System as a Backbone for Bioinformatics Data Integration and Analysis , 2003, Bioinformatics.

[6]  Michael J. Carey,et al.  Reducing the Braking Distance of an SQL Query Engine , 1998, VLDB.

[7]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[8]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Raghu Ramakrishnan,et al.  Probabilistic Optimization of Top N Queries , 1999, VLDB.

[10]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[11]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[12]  Gerhard Weikum,et al.  Top-k Query Evaluation with Probabilistic Guarantees , 2004, VLDB.

[13]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[14]  N Linial,et al.  ProtoMap: Automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space , 1999, Proteins.

[15]  Golan Yona,et al.  BIOZON: a system for unification, management and analysis of heterogeneous biological data , 2006, BMC Bioinformatics.

[16]  Alon Y. Halevy,et al.  A model for data integration systems of biomedical data applied to online genetic databases , 2001, AMIA.

[17]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[18]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[19]  Roy Goldman,et al.  Proximity Search in Databases , 1998, VLDB.

[20]  Maria-Esther Vidal,et al.  BioNavigation: Selecting Optimum Paths Through Biological Resources to Evaluate Ontological Navigational Queries , 2005, DILS.

[21]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[22]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[23]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[24]  Golan Yona,et al.  BIOZON: a hub of heterogeneous biological data , 2006, Nucleic Acids Res..

[25]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[26]  Mark D. Wilkinson,et al.  BioMOBY: An Open Source Biological Web Services Proposal , 2002, Briefings Bioinform..

[27]  Shaul Dar,et al.  DTL's DataSpot: Database Exploration Using Plain Language , 1998, VLDB.

[28]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.