The role of informatics in glycobiology research with special emphasis on automatic interpretation of MS spectra.

This paper reviews the current status of bioinformatics applications and databases in glycobiology, which are based on bioinformatics approaches as well as informatics for glycobiology where an explicit encoding of glycan structures is required. The availability of the complete sequence of the human genome has accelerated the systematic identification of so far unidentified glycogenes considerably in many areas of glycobiology using well-established bioinfomatics tools. Although there has been an immense development of new glyco-related data collections as well as informatics tools and several efforts have been started to cross-link and reference the various data deposited in distributed databases, informatics for glycobiology and glycomics is still poorly developed compared to the genomics and proteomics area. The development of algorithms for the automatic interpretation of MS spectra - currently, a severe bottleneck, which hampers the rapid and reliable interpretation of MS data in high-throughput glycomics projects - is reviewed. A comprehensive list of web resources is given. Several lines of progression are discussed. There is an urgent need for the development of decentralised input facilities of experimentally determined glycan structures. Simultaneously, agreements of standards for the structural description of glycans as well as formats for the related data have to be established. The integration of glycomics with genomics/proteomics has to increase.

[1]  J. Paulson,et al.  Glycomics: an integrated systems approach to structure-function relationships of glycans , 2005, Nature Methods.

[2]  Eoin Fahy,et al.  A comprehensive classification system for lipids11 The evaluation of this manuscript was handled by the former Editor-in-Chief Trudy Forte. Published, JLR Papers in Press, February 16, 2005. DOI 10.1194/jlr.E400004-JLR200 , 2005, Journal of Lipid Research.

[3]  A. Varki,et al.  Siglecs--the major subfamily of I-type lectins. , 2006, Glycobiology.

[4]  Claus-W von der Lieth,et al.  GLYCO‐FRAGMENT: A web tool to support the interpretation of mass spectra of complex carbohydrates , 2003, Proteomics.

[5]  Bernard Henrissat,et al.  An evolving hierarchical family classification for glycosyltransferases. , 2003, Journal of molecular biology.

[6]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[7]  B. Meyer,et al.  Identification of the mass spectra of partially methylated alditol acetates by artificial neural networks , 1990 .

[8]  Georg Schneider,et al.  Prediction of lipid posttranslational modifications and localization signals from protein sequences: big-, NMT and PTS1 , 2003, Nucleic Acids Res..

[9]  J. Jiménez-Barbero,et al.  Chemical Biology of the Sugar Code , 2004, Chembiochem : a European journal of chemical biology.

[10]  Tatsuya Akutsu,et al.  A score matrix to reveal the hidden links in glycans , 2005, Bioinform..

[11]  S. Sidhu,et al.  Phylogenetic analysis of the vertebrate galectin family. , 2004, Molecular biology and evolution.

[12]  James C Paulson,et al.  Custom microarray for glycobiologists: considerations for glycosyltransferase gene expression profiling. , 2002, Biochemical Society symposium.

[13]  Søren Brunak,et al.  Prediction of Glycosylation Across the Human Proteome and the Correlation to Protein Function , 2001, Pacific Symposium on Biocomputing.

[14]  Serge Pérez,et al.  Prospects for glycoinformatics. , 2005, Current opinion in structural biology.

[15]  C. W. von der Lieth,et al.  LINUCS: linear notation for unique description of carbohydrate sequences. , 2001, Carbohydrate research.

[16]  Søren Brunak,et al.  O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins , 1999, Nucleic Acids Res..

[17]  Raymond A Dwek,et al.  Statistical analysis of the protein environment of N-glycosylation sites: implications for occupancy, structure, and folding. , 2003, Glycobiology.

[18]  R Apweiler,et al.  On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. , 1999, Biochimica et biophysica acta.

[19]  Raphael Schiffmann,et al.  MS screening strategies: investigating the glycomes of knockout and myodystrophic mice and leukodystrophic human brains. , 2002, Biochemical Society symposium.

[20]  Albert Sickmann,et al.  Laser-induced dissociation/high-energy collision-induced dissociation fragmentation using MALDI-TOF/TOF-MS instrumentation for the analysis of neutral and acidic oligosaccharides. , 2005, Analytical chemistry.

[21]  Maureen E. Taylor,et al.  Identification of lectins from genomic sequence data. , 2003, Methods in enzymology.

[22]  Tatsuya Akutsu,et al.  KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains , 2004, Nucleic Acids Res..

[23]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[24]  Martin Frank,et al.  Dynamic molecules: molecular dynamics for everyone. An internet-based access to molecular dynamic simulations: basic concepts , 2003, Journal of molecular modeling.

[25]  Toshihide Shikanai,et al.  The carbohydrate sequence markup language (CabosML): an XML description of carbohydrate structures , 2005, Bioinform..

[26]  Christelle Breton,et al.  A new superfamily of protein-O-fucosyltransferases, α2-fucosyltransferases, and α6-fucosyltransferases: phylogeny and identification of conserved peptide motifs , 2003 .

[27]  H. Schachter Protein glycosylation lessons from Caenorhabditis elegans. , 2004, Current opinion in structural biology.

[28]  Martin Frank,et al.  Data mining the protein data bank: automatic detection and assignment of carbohydrate structures. , 2004, Carbohydrate research.

[29]  H. Perreault,et al.  Application of the StrOligo algorithm for the automated structure assignment of complex N-linked glycans from glycoproteins using tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[30]  Catherine A. Cooper,et al.  GlycoMod – A software tool for determining glycosylation compositions from mass spectrometric data , 2001, Proteomics.

[31]  J F Vliegenthart,et al.  A 1H NMR database computer program for the analysis of the primary structure of complex carbohydrates. , 1992, Carbohydrate research.

[32]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[33]  Haixu Tang,et al.  Automated interpretation of MS/MS spectra of oligosaccharides , 2005, ISMB.

[34]  Hailong Zhang,et al.  Congruent strategies for carbohydrate sequencing. 2. FragLib: an MSn spectral library. , 2005, Analytical chemistry.

[35]  Sebastian Maurer-Stroh,et al.  Enzymes and auxiliary factors for GPI lipid anchor biosynthesis and post-translational transfer to proteins. , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[36]  Martin Frank,et al.  Bioinformatics for glycomics: Status, methods, requirements and perspectives , 2004, Briefings Bioinform..

[37]  Hisashi Narimatsu,et al.  A focused microarray approach to functional glycomics: transcriptional regulation of the glycome. , 2006, Glycobiology.

[38]  K. Drickamer,et al.  Genomic analysis of C-type lectins. , 2002, Biochemical Society symposium.

[39]  Antje Chang,et al.  New Developments , 2003 .

[40]  H. Narimatsu,et al.  Comparison of glycosyltransferase families using the profile hidden Markov model. , 2003 .

[41]  Hiren J. Joshi,et al.  GlycoSuiteDB: a curated relational database of glycoprotein glycan structures and their biological sources. 2003, update , 2003, Nucleic Acids Res..

[42]  James Paulson,et al.  Automatic annotation of matrix‐assisted laser desorption/ionization N‐glycan spectra , 2005, Proteomics.

[43]  N. Blom,et al.  Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence , 2004, Proteomics.

[44]  B. Henrissat,et al.  Why are there so many carbohydrate-active enzyme-related genes in plants? , 2003, Trends in plant science.

[45]  Susumu Goto,et al.  Prediction of glycan structures from gene expression data based on glycosyltransferase reactions , 2005, Bioinform..

[46]  Computer-assisted structural analysis of oligo- and polysaccharides: an extension of CASPER to multibranched structures. , 1998, Carbohydrate research.

[47]  Y. Wada,et al.  Two-dimensional elution map of GalNAc-containing N-linked oligosaccharides. , 1993, Analytical biochemistry.

[48]  Sebastian Maurer-Stroh,et al.  Prediction of sequence signals for lipid post‐translational modifications: Insights from case studies , 2004, Proteomics.

[49]  Andreas Bohne,et al.  SWEET - WWW-based rapid 3D construction of oligo- and polysaccharides , 1999, Bioinform..

[50]  Eitan Rubin,et al.  Biases and complex patterns in the residues flanking protein N-glycosylation sites. , 2003, Glycobiology.

[51]  Philippe Delannoy,et al.  The animal sialyltransferases and sialyltransferase-related genes: a phylogenetic approach. , 2005, Glycobiology.

[52]  Claus-Wilhelm von der Lieth,et al.  GlycoFragment and GlycoSearchMS: web tools to support the interpretation of mass spectra of complex carbohydrates , 2004, Nucleic Acids Res..

[53]  Yoshihiro Yamanishi,et al.  Extraction of leukemia specific glycan motifs in humans by computational glycomics. , 2005, Carbohydrate research.

[54]  D. Ashline,et al.  Congruent strategies for carbohydrate sequencing. 3. OSCAR: an algorithm for assigning oligosaccharide topology from MSn data. , 2005, Analytical chemistry.

[55]  D. Ashline,et al.  Congruent strategies for carbohydrate sequencing. 1. Mining structural details by MSn. , 2005, Analytical chemistry.

[56]  J. Leary,et al.  STAT: a saccharide topology analysis tool used in combination with tandem mass spectrometry. , 2000, Analytical chemistry.

[57]  Kiyoko F. Aoki-Kinoshita,et al.  KEGG as a glycome informatics resource. , 2006, Glycobiology.

[58]  Roger A. Laine,et al.  Invited Commentary: A calculation of all possible oligosaccharide isomers both branched and linear yields 1.05 × 1012 structures for a reducing hexasaccharide: the Isomer Barrier to development of single-method saccharide sequencing or synthesis systems , 1994 .

[59]  Carolyn R. Bertozzi,et al.  Essentials of Glycobiology , 1999 .

[60]  Claus-W. von der Lieth An Endorsement to Create Open Access Databases for Analytical Data of Complex Carbohydrates , 2004 .

[61]  S. Brunak,et al.  Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. , 2005, Glycobiology.

[62]  Hans-Joachim Gabius,et al.  The sugar code: functional lectinomics. , 2002, Biochimica et biophysica acta.

[63]  Martin Frank,et al.  Carbohydrate Structure Suite (CSS): analysis of carbohydrate 3D structures derived from the PDB , 2004, Nucleic Acids Res..

[64]  Hisashi Narimatsu,et al.  Construction of a human glycogene library and comprehensive functional analysis , 2004, Glycoconjugate Journal.

[65]  Roland Stenutz,et al.  Web resources for the carbohydrate chemist. , 2004, Carbohydrate research.

[66]  Claus-Wilhelm von der Lieth,et al.  GlyProt: in silico glycosylation of proteins , 2005, Nucleic Acids Res..

[67]  Niclas G Karlsson,et al.  Development of a mass fingerprinting tool for automated interpretation of oligosaccharide fragmentation data , 2004, Proteomics.

[68]  Katsutoshi Takahashi,et al.  A strategy for identification of oligosaccharide structures using observational multistage mass spectral library. , 2005, Analytical chemistry.