Biomine: A Network-Structured Resource of Biological Entities for Link Prediction

Biomine is a biological graph database constructed from public databases. Its entities (vertices) include biological concepts (such as genes, proteins, tissues, processes and phenotypes, as well as scientific articles) and relations (edges) between these entities correspond to real-world phenomena such as "a gene codes for a protein" or "an article refers to a phenotype". Biomine also provides tools for querying the graph for connections and visualizing them interactively. We describe the Biomine graph database. We also discuss link discovery in such biological graphs and review possible link prediction measures. Biomine currently contains over 1 million entities and over 8 million relations between them, with focus on human genetics. It is available on-line and can be queried for connecting subgraphs between biological entities.

[1]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[2]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[3]  Yehuda Koren,et al.  Measuring and extracting proximity in networks , 2006, KDD '06.

[4]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[5]  Charles J. Colbourn,et al.  The Combinatorics of Network Reliability , 1987 .

[6]  Yehuda Koren,et al.  Measuring and extracting proximity graphs in networks , 2007, TKDD.

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[9]  María Martín,et al.  The Universal Protein Resource (UniProt) in 2010 , 2010 .

[10]  Fuji Zhang,et al.  The expected hitting times for finite Markov chains , 2008 .

[11]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2010, Nucleic Acids Res..

[12]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[13]  Hannu Toivonen,et al.  Link Discovery in Graphs Derived from Biological Databases , 2006, DILS.

[14]  Golan Yona,et al.  BIOZON: a system for unification, management and analysis of heterogeneous biological data , 2006, BMC Bioinformatics.

[15]  Baris E. Suzek,et al.  The Universal Protein Resource (UniProt) in 2010 , 2009, Nucleic Acids Res..

[16]  Ryan D. Morin,et al.  The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC). , 2004, Genome research.

[17]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[18]  Tobias Kötter,et al.  From Information Networks to Bisociative Information Networks , 2012, Bisociative Knowledge Discovery.

[19]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[20]  Priyanka Gupta,et al.  BioWarehouse: a bioinformatics database warehouse toolkit , 2006, BMC Bioinformatics.

[21]  Tobias Kötter,et al.  Towards Creative Information Exploration Based on Koestler's Concept of Bisociation , 2012, Bisociative Knowledge Discovery.

[22]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[23]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[24]  Michael R. Berthold Bisociative Knowledge Discovery , 2011, IDA.

[25]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[26]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[27]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[28]  Christopher J. Rawlings,et al.  Graph-based analysis and visualization of experimental results with ONDEX , 2006, Bioinform..

[29]  Ulrik Brandes,et al.  Centrality Measures Based on Current Flow , 2005, STACS.

[30]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.