Genomic Annotation Prediction Based on Integrated Information

In the recent years, an increasingly large amount of biomedical and biomolecular information and data has become available to researchers, allowing the scientific community to infer new knowledge and reach new objectives. As these information increase, so does the difficulty in managing it efficiently. In this paper, we present a short overview of our proposal to solve this problem, a prototypal multi-organism Genomic and Proteomic Data Warehouse called GPDW, based at Politecnico di Milano. We also present the computational methods we implemented to exploit it. Experimental studies on datasets demonstrated the effectiveness of our resource and methods.

[1]  Marco Masseroli,et al.  Management and Analysis of Genomic Functional and Phenotypic Controlled Annotations to Support Biomedical Investigation and Practice , 2007, IEEE Transactions on Information Technology in Biomedicine.

[2]  Shahrokh Saeednia,et al.  How to maintain both privacy and authentication in digital libraries , 2000 .

[3]  S. Dwight,et al.  Predicting gene function from patterns of annotation. , 2003, Genome research.

[4]  Purvesh Khatri,et al.  A semantic analysis of the annotations of the human genome , 2005, Bioinform..

[5]  Uwe Scholz,et al.  BioDataServer: A SQL-based service for the online integration of life science data , 2002, Silico Biol..

[6]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[7]  Walter V. Sujansky,et al.  Heterogeneous Database Integration in Biomedicine , 2001, J. Biomed. Informatics.

[8]  Tsviya Olender,et al.  GeneCardsTM 2002: towards a complete, object-oriented, human gene compendium , 2002, Bioinform..

[9]  Ravinder Singh,et al.  Fast-Find: A novel computational approach to analyzing combinatorial motifs , 2006, BMC Bioinformatics.

[10]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[11]  David Botstein,et al.  SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data , 2003, Nucleic Acids Res..

[12]  Michael W. Berry,et al.  SVDPACKC (Version 1.0) User''s Guide , 1993 .

[13]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[14]  Marco Masseroli,et al.  Bio-SeCo: Integration and Global Ranking of Biomedical Search Results , 2010, SeCO Workshop.

[15]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[16]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[17]  Daniele Braga,et al.  Search Computing Challenges and Directions , 2010, ICOODB.

[18]  JAMES DEMMEL,et al.  LAPACK: A portable linear algebra library for high-performance computers , 1990, Proceedings SUPERCOMPUTING '90.

[19]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[20]  Tin Wee Tan,et al.  Large-scale analysis of antigenic diversity of T-cell epitopes in dengue virus , 2006, BMC Bioinformatics.

[21]  Priyanka Gupta,et al.  BioWarehouse: a bioinformatics database warehouse toolkit , 2006, BMC Bioinformatics.

[22]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[23]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[24]  Daniel V. Pryor,et al.  Proceedings of the 1990 ACM/IEEE conference on Supercomputing , 1990 .

[25]  Francesco Pinciroli,et al.  GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining , 2004, Nucleic Acids Res..

[26]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[27]  Michael Y. Galperin,et al.  Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009 , 2008, Nucleic Acids Res..

[28]  Rob Gordon,et al.  Essential Jni: Java Native Interface , 1998 .

[29]  Subbarao Kambhampati,et al.  Integration of biological sources: current systems and challenges ahead , 2004, SGMD.

[30]  Val Tannen,et al.  K2/Kleisli and GUS: Experiments in integrated access to genomic data sources , 2001, IBM Syst. J..

[31]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[32]  E. Birney,et al.  EnsMart: a generic system for fast and flexible access to biological data. , 2003, Genome research.

[33]  Joaquín Dopazo,et al.  The role of the environment in Parkinson's disease. , 1996, Nucleic Acids Res..

[34]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[35]  Stefano Ceri Search Computing , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[36]  Limsoon Wong,et al.  BioKleisli: a digital library for biomedical researchers , 1997, International Journal on Digital Libraries.

[37]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[38]  Tatiana A. Tatusova,et al.  Complete genomes in WWW Entrez: data representation and analysis , 1999, Bioinform..