ANEXdb: an integrated animal ANnotation and microarray EXpression database

To determine annotations of the sequence elements on microarrays used for transcriptional profiling experiments in livestock species, currently researchers must either use the sparse direct annotations available for these species or create their own annotations. ANEXdb (http://www.anexdb.org) is an open-source web application that supports integrated access of two databases that house microarray expression (ExpressDB) and EST annotation (AnnotDB) data. The expression database currently supports storage and querying of Affymetrix-based expression data as well as retrieval of experiments in a form ready for NCBI-GEO submission; these services are available online. AnnotDB currently houses a novel assembly of approximately 1.6 million unique porcine-expressed sequence reads called the Iowa Porcine Assembly (IPA), which consists of 140,087 consensus sequences, the Iowa Tentative Consensus (ITC) sequences, and 103,888 singletons. The IPA has been annotated via transfer of information from homologs identified through sequence alignment to NCBI RefSeq. These annotated sequences have been mapped to the Affymetrix porcine array elements, providing annotation for 22,569 of the 23,937 (94%) porcine-specific probe sets, of which 19,253 (80%) are linked to an NCBI RefSeq entry. The ITC has also been mined for sequence variation, providing evidence for up to 202,383 SNPs, 62,048 deletions, and 958 insertions in porcine-expressed sequence. These results create a single location to obtain porcine annotation of and sequence variation in differently expressed genes in expression experiments, thus permitting possible identification of causal variants in such genes of interest. The ANEXdb application is open source and available from SourceForge.net.

[1]  Søren Brunak,et al.  SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotation , 2007, ISMB/ECCB.

[2]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[3]  Lars Bolund,et al.  Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags , 2007, Genome Biology.

[4]  J. Steibel,et al.  Assessment of the swine protein-annotated oligonucleotide microarray. , 2009, Animal genetics.

[5]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[6]  David K. Meyerholz,et al.  Disruption of the CFTR Gene Produces a Model of Cystic Fibrosis in Newborn Pigs , 2008, Science.

[7]  Kohei Suzuki,et al.  PEDE (Pig EST Data Explorer) has been expanded into Pig Expression Data Explorer, including 10 147 porcine full-length cDNA sequences , 2006, Nucleic Acids Res..

[8]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[9]  M. Rothschild,et al.  SNP discovery in Litopenaeus vannamei with a new computational pipeline. , 2009, Animal genetics.

[10]  D J Nonneman,et al.  Annotation of the Affymetrix porcine genome microarray. , 2006, Animal genetics.

[11]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[12]  L. Liang,et al.  Mapping complex disease traits with global gene expression , 2009, Nature Reviews Genetics.

[13]  Kimmen Sjölander,et al.  Phylogenomic inference of protein molecular function: advances and challenges , 2004, Bioinform..

[14]  D. Hart,et al.  Genetic involvement in skin wound healing and scarring in domestic pigs: assessment of molecular expression patterns in (Yorkshire x Red Duroc) x Yorkshire backcross animals. , 2007, The Journal of investigative dermatology.

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[17]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[18]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[19]  Heebal Kim,et al.  The Pig Genome Database (PiGenome): an integrated database for pig genome research , 2008, Mammalian Genome.

[20]  T. Ideker,et al.  Mining SNPs from EST databases. , 1999, Genome research.

[21]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[22]  P. S. Pine,et al.  Assessment of repeated microarray experiments using mixed tissue RNA reference samples. , 2008, BioTechniques.

[23]  R. Prather,et al.  Progress in producing knockout models for xenotransplantation by nuclear transfer , 2002, Annals of medicine.

[24]  P. Bork,et al.  EST analysis online: WWW tools for detection of SNPs and alternative splice forms. , 2000, Trends in genetics : TIG.

[25]  John Quackenbush,et al.  The TIGR Gene Indices: reconstruction and representation of expressed gene sequences , 2000, Nucleic Acids Res..

[26]  Robin B. Gasser,et al.  A hitchhiker's guide to expressed sequence tag (EST) analysis , 2006, Briefings Bioinform..

[27]  E E Schadt,et al.  Novel integrative genomics strategies to identify genes for complex traits , 2006, Animal genetics.

[28]  C. Tuggle,et al.  Expression of Collagen Genes in the Cones of Skin in the Duroc/Yorkshire Porcine Model of Fibroproliferative Scarring , 2008, Journal of burn care & research : official publication of the American Burn Association.

[29]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[30]  Marek J. Sergot,et al.  SEAN: SNP prediction and display program utilizing EST sequence clusters , 2006, Bioinform..

[31]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..