Bioinformatics Research and Applications

Mass spectrometry is nowadays the method of choice for protein characterization in proteomics. Computer algorithms and software have played an essential role in analyzing the large amount of mass spectrometry data produced in any proteomics experiment. The fundamental task of such analyses is to identify the peptide for each spectrum in the data. Such identification is called “database search” if it requires the assistance of a protein database, and called “de novo sequencing” if not. In the past 20 years, many database search software tools have been developed for peptide identification; and a particular one, Mascot, that was developed in 1999, became dominant in the market. While new tools were continuously published in the following decade, none has significantly improved Mascot. The situation was disrupted around 2010, when the field witnessed a flurry of new database search tools that significantly improved Mascot in terms of both accuracy and sensitivity. In the first part of the talk, the peptide identification problem will be introduced, and the history briefly reviewed. In the second part of the talk, some practical concerns for using the bioinformatics tools in a proteomics lab are discussed. Properly dealing with these concerns resulted into the significant improvement we witnessed in the past few years. The second part of the talk will be focused on the research conducted at the author’s own group. Z. Cai et al. (Eds.): ISBRA 2013, LNBI 7875, p. 1, 2013. c © Springer-Verlag Berlin Heidelberg 2013 Identifying Critical Transitions of Biological Processes by Dynamical Network Biomarkers

[1]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[2]  Dan Gusfield,et al.  A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters , 2005, RECOMB.

[3]  Zhi-Zhong Chen,et al.  Algorithms for Reticulate Networks of Multiple Phylogenetic Trees , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Yu Lin,et al.  Fast and Accurate Phylogenetic Reconstruction from High-Resolution Whole-Genome Data and a Novel Robustness Estimator , 2010, RECOMB-CG.

[5]  Frances S. Turner,et al.  Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes , 2006, Nucleic acids research.

[6]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[7]  Wing-Kin Sung,et al.  RB-Finder: An Improved Distance-Based Sliding Window Method to Detect Recombination Breakpoints , 2008, J. Comput. Biol..

[8]  Tandy J. Warnow,et al.  Kaikoura Tree Theorems: Computing the Maximum Agreement Subtree , 1993, Inf. Process. Lett..

[9]  Yu Lin,et al.  Maximum Likelihood Phylogenetic Reconstruction from High-Resolution Whole-Genome Data and a Tree of 68 Eukaryotes , 2012, Pacific Symposium on Biocomputing.

[10]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[11]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[12]  David Sankoff,et al.  A consolidation algorithm for genomes fractionated after higher order polyploidization , 2012, BMC Bioinformatics.

[13]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[14]  Daniel Falush,et al.  Mismatch induced speciation in Salmonella: model and data , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[15]  Wing-Kin Sung,et al.  Algorithms for Combining Rooted Triplets into a Galled Phylogenetic Network , 2006, SIAM J. Comput..

[16]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[17]  Jijun Tang,et al.  A mixture framework for inferring ancestral gene orders , 2012, BMC Genomics.

[18]  Alexandru Telea,et al.  Skeletonization and Distance Transforms of 3D Volumes Using Graphics Hardware , 2006, DGCI.

[19]  Breakpoint Phylogenies. , 1997, Genome informatics. Workshop on Genome Informatics.

[20]  Yu Lin,et al.  Bootstrapping Phylogenies Inferred from Rearrangement Data , 2011, WABI.

[21]  Jijun Tang,et al.  Improving genome rearrangement phylogeny using sequence-style parsimony , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[22]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[23]  MICHEL HABIB,et al.  Constructing a Minimum phylogenetic Network from a Dense triplet Set , 2012, J. Bioinform. Comput. Biol..

[24]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[25]  C. Fraser,et al.  Recombination and the Nature of Bacterial Speciation , 2007, Science.

[26]  Bernard B. Suh,et al.  Reconstructing contiguous regions of an ancestral genome. , 2006, Genome research.

[27]  David Sankoff,et al.  Decompositions of Multiple Breakpoint Graphs and Rapid Exact Solutions to the Median Problem , 2008, WABI.

[28]  Daniel H. Huson,et al.  Beyond Galled Trees - Decomposition and Computation of Galled Networks , 2007, RECOMB.

[29]  Tao Ju,et al.  Interactive skeletonization of intensity volumes , 2009, The Visual Computer.

[30]  Catia Pesquita,et al.  Evaluating GO-based Semantic Similarity Measures , 2007 .

[31]  Matthew L. Baker,et al.  Computing a Family of Skeletons of Volumetric Models for Shape Description , 2006, GMP.

[32]  Tandy J. Warnow,et al.  Reconstructing reticulate evolution in species: theory and practice , 2004, RECOMB.

[33]  Meng Zhang,et al.  Maximum likelihood phylogenetic reconstruction using gene order encodings , 2011, 2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[34]  Leo van Iersel,et al.  Constructing Level-2 Phylogenetic Networks from Triplets , 2008, RECOMB.

[35]  M. Baker,et al.  Modeling protein structure at near atomic resolutions with Gorgon. , 2011, Journal of structural biology.

[36]  Wing-Kin Sung,et al.  Constructing a Smallest Refining Galled Phylogenetic Network , 2005, RECOMB.

[37]  Jianpeng Ma,et al.  A Structural-informatics approach for tracing beta-sheets: building pseudo-C(alpha) traces for beta-strands in intermediate-resolution density maps. , 2004, Journal of molecular biology.

[38]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[39]  Mikkel Thorup,et al.  Sparse Dynamic Programming for Evolutionary-Tree Comparison , 1997, SIAM J. Comput..

[40]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..

[41]  Tak Wah Lam,et al.  Computing the Unrooted Maximum Agreement Subtree in Sub-quadratic Time , 1996, Nord. J. Comput..

[42]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[43]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[44]  J. Majewski,et al.  Sexual isolation in bacteria. , 2001, FEMS microbiology letters.