Biomolecular annotation prediction through information integration

In the recent years, an increasingly large amount of biomedical and biomolecular information and data has become available to researchers, allowing to the scientific community to infer new knowledge and reach new objectives. As these information increase, so does the difficulty in managing it efficiently. In this paper, we present a short overview of our proposal to solve this problem, a prototypal multi-organism Genomic and Proteomic Data Warehouse called GPDW, based at Politecnico di Milano. We also present the computational methods we implemented to exploit it. Experimental studies on datasets demonstrated the effectiveness of our resource and methods.

[1]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[2]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[3]  Subbarao Kambhampati,et al.  Integration of biological sources: current systems and challenges ahead , 2004, SGMD.

[4]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[5]  Walter V. Sujansky,et al.  Heterogeneous Database Integration in Biomedicine , 2001, J. Biomed. Informatics.

[6]  Francesco Pinciroli,et al.  GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining , 2004, Nucleic Acids Res..

[7]  Daniele Braga,et al.  Search Computing Challenges and Directions , 2010, ICOODB.

[8]  Val Tannen,et al.  K2/Kleisli and GUS: Experiments in integrated access to genomic data sources , 2001, IBM Syst. J..

[9]  Tsviya Olender,et al.  GeneCardsTM 2002: towards a complete, object-oriented, human gene compendium , 2002, Bioinform..

[10]  David Botstein,et al.  SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data , 2003, Nucleic Acids Res..

[11]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[12]  Michael W. Berry,et al.  SVDPACKC (Version 1.0) User''s Guide , 1993 .

[13]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[14]  JAMES DEMMEL,et al.  LAPACK: A portable linear algebra library for high-performance computers , 1990, Proceedings SUPERCOMPUTING '90.

[15]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[16]  Limsoon Wong,et al.  BioKleisli: a digital library for biomedical researchers , 1997, International Journal on Digital Libraries.

[17]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[18]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[19]  Joaquín Dopazo,et al.  The role of the environment in Parkinson's disease. , 1996, Nucleic Acids Res..

[20]  Purvesh Khatri,et al.  A semantic analysis of the annotations of the human genome , 2005, Bioinform..

[21]  Michael Y. Galperin,et al.  Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009 , 2008, Nucleic Acids Res..

[22]  Uwe Scholz,et al.  BioDataServer: A SQL-based service for the online integration of life science data , 2002, Silico Biol..

[23]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[24]  Rob Gordon,et al.  Essential Jni: Java Native Interface , 1998 .

[25]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[26]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[27]  Priyanka Gupta,et al.  BioWarehouse: a bioinformatics database warehouse toolkit , 2006, BMC Bioinformatics.

[28]  E. Birney,et al.  EnsMart: a generic system for fast and flexible access to biological data. , 2003, Genome research.

[29]  Marco Masseroli,et al.  Management and Analysis of Genomic Functional and Phenotypic Controlled Annotations to Support Biomedical Investigation and Practice , 2007, IEEE Transactions on Information Technology in Biomedicine.

[30]  Marco Masseroli,et al.  Bio-SeCo: Integration and Global Ranking of Biomedical Search Results , 2010, SeCO Workshop.

[31]  S. Dwight,et al.  Predicting gene function from patterns of annotation. , 2003, Genome research.

[32]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[33]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[34]  Tatiana A. Tatusova,et al.  Complete genomes in WWW Entrez: data representation and analysis , 1999, Bioinform..