Bioinformatics Tools and Databases for Genomics Research

Bioinformatics involves the development of statistical tools and techniques and computer software for acquisition, storage, analysis, and visualization of biological information. The European Molecular Biology laboratory (EMBL), the National Center for Biotechnology Information (NCBI), and the DNA Databank of Japan (DDBJ) have been catering to the needs of the researchers around the globe for decades, and the databases and tools hosted by these institutes are continually growing at a rapid pace. Analytical tools such as BLAST and CLUSTAL have been the workhorses for sequence data search and analysis, and these programs have been maintained since the 1990s. In addition, many others tools like AutoSNP, SNP2CAPS, TASSEL, STRUCTURE, etc. are useful for sequence data analysis and for deriving biologically meaningful conclusions based on these analyses. On the other hand, databases like GenBank, Phytozome, the EMBL Nucleotide Sequence Database, SwissProt, and Uniprot Knowledgebase, etc. store huge amounts of nucleotide and protein sequence information that are readily accessible to the public. In addition, the Kyoto Encyclopaedia of Genes and Genomes (KEGG) attempts to understand higher-order biological functions by integrating gene, protein, and metabolic pathway information. This chapter is devoted to the description of various bioinformatics tools and databases relevant for plant breeding activities and discusses their relevant features and applications.

[1]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[2]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[3]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[4]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[5]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[6]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: an online compilation of relevant database resources , 2000, Nucleic Acids Res..

[7]  David Edwards,et al.  Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP , 2003, Bioinform..

[8]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[9]  Nigel W. Hardy,et al.  A proposed framework for the description of plant metabolomics experiments and their results , 2004, Nature Biotechnology.

[10]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[11]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[12]  David Wood,et al.  AutoSNPdb: an annotated single nucleotide polymorphism database for crop plants , 2008, Nucleic Acids Res..

[13]  Kiyoko F. Aoki-Kinoshita,et al.  KEGG as a glycome informatics resource. , 2006, Glycobiology.

[14]  Timothy A. Erwin,et al.  SNPServer: a real-time SNP discovery tool , 2005, Nucleic Acids Res..

[15]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its new supplement TREMBL , 1996, Nucleic Acids Res..

[16]  Edward S. Buckler,et al.  Gramene database in 2010: updates and extensions , 2010, Nucleic Acids Res..

[17]  David M. Goodstein,et al.  Phytozome: a comparative platform for green plant genomics , 2011, Nucleic Acids Res..

[18]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[19]  Ivo Grosse,et al.  SNP2CAPS: a SNP and INDEL analysis tool for CAPS marker development. , 2004, Nucleic acids research.

[20]  Carolyn J. Lawrence-Dill,et al.  The Maize Genetics and Genomics Database. The Community Resource for Access to Diverse Maize Data1 , 2005, Plant Physiology.

[21]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[22]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[23]  O. Anderson,et al.  GrainGenes 2.0. An Improved Resource for the Small-Grains Community1 , 2005, Plant Physiology.

[24]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[25]  Richard M. Bruskiewich,et al.  RiceGeneThresher: a web-based application for mining genes underlying QTL in rice genome , 2008, Nucleic Acids Res..

[26]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[27]  S. Rhee,et al.  MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. , 2004, The Plant journal : for cell and molecular biology.

[28]  L. Stein,et al.  Gramene, a Tool for Grass Genomics , 2002, Plant Physiology.