GenoLink: a graph-based querying and browsing system for investigating the function of genes and proteins

BackgroundA large variety of biological data can be represented by graphs. These graphs can be constructed from heterogeneous data coming from genomic and post-genomic technologies, but there is still need for tools aiming at exploring and analysing such graphs. This paper describes GenoLink, a software platform for the graphical querying and exploration of graphs.ResultsGenoLink provides a generic framework for representing and querying data graphs. This framework provides a graph data structure, a graph query engine, allowing to retrieve sub-graphs from the entire data graph, and several graphical interfaces to express such queries and to further explore their results. A query consists in a graph pattern with constraints attached to the vertices and edges. A query result is the set of all sub-graphs of the entire data graph that are isomorphic to the pattern and satisfy the constraints. The graph data structure does not rely upon any particular data model but can dynamically accommodate for any user-supplied data model. However, for genomic and post-genomic applications, we provide a default data model and several parsers for the most popular data sources. GenoLink does not require any programming skill since all operations on graphs and the analysis of the results can be carried out graphically through several dedicated graphical interfaces.ConclusionGenoLink is a generic and interactive tool allowing biologists to graphically explore various sources of information. GenoLink is distributed either as a standalone application or as a component of the Genostar/Iogma platform. Both distributions are free for academic research and teaching purposes and can be requested at academy@genostar.com. A commercial licence form can be obtained for profit company at info@genostar.com. See also http://www.genostar.org.

[1]  Anne Morgat,et al.  Integration of data and methods for genome analysis. , 2003, Current opinion in drug discovery & development.

[2]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[3]  Peter J. H. King,et al.  Gql, a declarative graphical query language based on the functional data model , 1994, AVI '94.

[4]  Alberto O. Mendelzon,et al.  Hy+: a Hygraph-based query and visualization system , 1993, SIGMOD '93.

[5]  Javier Larrosa,et al.  Constraint satisfaction algorithms for graph pattern matching , 2002, Mathematical Structures in Computer Science.

[6]  Yue Wang,et al.  A graph database with visual queries for genomics , 2005, APBC.

[7]  Alexandra Poulovassilis,et al.  Hyperlog: A Graph-Based System for Database Browsing, Querying, and Update , 2001, IEEE Trans. Knowl. Data Eng..

[8]  Ulrike Wittig,et al.  Analysis and Comparison of Metabolic Pathway Databases , 2001, Briefings Bioinform..

[9]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[10]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[11]  Martin Vingron,et al.  IntAct: an open source molecular interaction database , 2004, Nucleic Acids Res..

[12]  J. Wojcik,et al.  The protein–protein interaction map of Helicobacter pylori , 2001, Nature.

[13]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  P Guerdoux-Jamet,et al.  Indigo: a World-Wide-Web review of genomes and gene functions. , 1998, FEMS microbiology reviews.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  Yves Deville,et al.  An overview of data models for the analysis of biochemical pathways , 2003, Briefings Bioinform..

[17]  Jan Van den Bussche,et al.  GOOD: AGraph-Oriented Object Database System , 1993, SIGMOD Conference.

[18]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[19]  J. Gensel,et al.  From AROM , a new Object Based Knowledge Representation System , to WebAROM , a Knowledge Bases Server , 2000 .

[20]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[21]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[22]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[23]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..