Integration and Mining of Genomic Annotations: Experiences and Perspectives in GFINDer Data Warehousing

Many tasks in bioinformatics require the comprehensive evaluation of different types of data, generally available in distributed and heterogeneous data sources. Several approaches, including federated databases, multi databases and mediator based systems, have been proposed to integrate data from multiple sources. Yet, data warehousing seams to be the most adequate when numerous data need to be integrated, efficiently processed, and mined comprehensively. To support biological interpretation of high-throughput gene lists, we previously developed GFINDer (Genome Functional INtegrated Discoverer, http://www.bioinformatics.polimi.it/GFINDer/), a web server that statistically analyzes and mines functional and phenotypic gene annotations sparsely available in numerous databanks to highlight annotation categories significantly enriched or depleted in the considered gene lists. GFINDer includes a data warehouse that integrates gene and protein annotations of several organisms expressed through various controlled terminologies and ontologies. Here, we describe GFINDer data warehouse and discuss the lessons learned in its construction and five-year maintenance and development.

[1]  Francesco Pinciroli,et al.  Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists , 2007, BMC Bioinformatics.

[2]  Francesco Pinciroli,et al.  GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists , 2005, Nucleic Acids Res..

[3]  Tin Wee Tan,et al.  Large-scale analysis of antigenic diversity of T-cell epitopes in dengue virus , 2006, BMC Bioinformatics.

[4]  Priyanka Gupta,et al.  BioWarehouse: a bioinformatics database warehouse toolkit , 2006, BMC Bioinformatics.

[5]  Carole A. Goble,et al.  TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources , 1998, ISMB.

[6]  Francesco Pinciroli,et al.  GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining , 2004, Nucleic Acids Res..

[7]  David Botstein,et al.  SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data , 2003, Nucleic Acids Res..

[8]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[9]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[10]  A. Campi,et al.  A Web-enabled Database of Human Gene Expression Controlled Annotations for Gene List Functional Evaluation , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Peter Tarczy-Hornoch,et al.  Biomediator Data Integration and Inference for Functional Annotation of Anonymous Sequences , 2006, Pacific Symposium on Biocomputing.

[13]  Michael Y. Galperin,et al.  Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009 , 2008, Nucleic Acids Res..