A query language for biological networks

MOTIVATION Many areas of modern biology are concerned with the management, storage, visualization, comparison and analysis of networks, but no appropriate query language for such complex data structures yet exists. RESULTS We have designed and implemented the pathway query language (PQL) for querying large protein interaction or pathway databases. PQL is based on a simple graph data model with extensions reflecting properties of biological objects. Queries match subgraphs in the database based on node properties and paths between nodes. The syntax is easy to learn for anybody familiar with SQL. As an important feature, a query may require a certain structure in the database to exist but return a different subgraph. We have tested PQL queries on networks of up to 16,000 nodes and found it to scale very well. AVAILABILITY The code is available on request from the author.

[1]  Peter D. Karp,et al.  The EcoCyc Database , 2002, Nucleic Acids Res..

[2]  Vassilis Christophides,et al.  RQL: a declarative query language for RDF , 2002, WWW.

[3]  Denys Proux,et al.  A Pragmatic Information Extraction Strategy for Gathering Data on Genetic Interactions , 2000, ISMB.

[4]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[5]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[6]  Sophie Cluet,et al.  Your mediators need data conversion! , 1998, SIGMOD '98.

[7]  Ulf Leser,et al.  Optimizing syntax patterns for discovering protein-protein interactions , 2005, SAC '05.

[8]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[9]  H. V. Jagadish,et al.  Direct Algorithms for Computing the Transitive Closure of Database Relations , 1987, VLDB.

[10]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[11]  RalfHiutmut Gtiting,et al.  GraphDB : Modeling and Querying Graphs in Databases , 1998 .

[12]  Michel Scholl,et al.  Gram: a graph data model and query languages , 1992, ECHT '92.

[13]  Ioannis Xenarios,et al.  Mining literature for protein-protein interactions , 2001, Bioinform..

[14]  Toshihisa Takagi,et al.  Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource. , 2003, Genome research.

[15]  Charles DeLisi,et al.  Predictome: a database of putative functional links between proteins , 2002, Nucleic Acids Res..

[16]  P. Legrain,et al.  Genome‐wide protein interaction maps using two‐hybrid systems , 2000, FEBS letters.

[17]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[18]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[19]  Gultekin Özsoyoglu,et al.  Pathways Database System: An Integrated System for Biological Pathways , 2003, Bioinform..

[20]  Joachim Selbig,et al.  PaVESy: Pathway Visualization and Editing System , 2004, Bioinform..

[21]  Alon Y. Halevy,et al.  PQL: a declarative query language over dynamic biological schemata , 2002, AMIA.

[22]  Dan Suciu,et al.  A query language for a Web-site management system , 1997, SGMD.

[23]  Daniel Hanisch,et al.  New methods for joint analysis of biological networks and expression data , 2004, German Conference on Bioinformatics.

[24]  R. Milo,et al.  Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Gultekin Özsoyoglu,et al.  A graph query language and its query processing , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[26]  Helen Parkinson,et al.  The MGED Ontology: A Framework for Describing Functional Genomics Experiments , 2003, Comparative and functional genomics.

[27]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[28]  Jo Wixon Pathway Databases , 2001, Comparative and functional genomics.

[29]  Joost N. Kok,et al.  Efficient Frequent Query Discovery in FARMER , 2003, PKDD.

[30]  S. Wodak,et al.  Representing and Analysing Molecular and Cellular Function Using the Computer , 2000, Biological chemistry.

[31]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[32]  Kimmen Sjölander,et al.  Phylogenomic inference of protein molecular function: advances and challenges , 2004, Bioinform..

[33]  Hiroaki Kitano,et al.  Foundations of systems biology , 2001 .

[34]  Gultekin Özsoyoglu,et al.  Pathways database system: an integrated set of tools for biological pathways , 2003, SAC '03.

[35]  Ralf Hartmut Güting,et al.  GraphDB: Modeling and Querying Graphs in Databases , 1994, VLDB.

[36]  Alfonso Valencia,et al.  Information extraction in molecular biology , 2002, Briefings Bioinform..

[37]  Alberto O. Mendelzon,et al.  GraphLog: a visual formalism for real life recursion , 1990, PODS '90.

[38]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[39]  Pedro Mendes,et al.  ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources , 2001, Bioinform..

[40]  B. Séraphin,et al.  A generic protein purification method for protein complex characterization and proteome exploration , 1999, Nature Biotechnology.

[41]  LeserUlf A query language for biological networks , 2005 .

[42]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[43]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1997, International Journal on Digital Libraries.

[44]  Martin Vingron,et al.  IntAct: an open source molecular interaction database , 2004, Nucleic Acids Res..

[45]  Peter D. Karp,et al.  An ontology for biological function based on molecular interactions , 2000, Bioinform..

[46]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[47]  Steven C. Lawlor,et al.  MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data , 2003, Genome Biology.

[48]  E Birney,et al.  The Genome Knowledgebase: a resource for biologists and bioinformaticists. , 2003, Cold Spring Harbor symposia on quantitative biology.

[49]  Eugene Inseok Chong,et al.  Supporting Ontology-Based Semantic matching in RDBMS , 2004, VLDB.

[50]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .