Algorithms in Bioinformatics

Knowing the location of a protein within the cell is important for understanding its function, role in biological processes, and potential use as a drug target. Much progress has been made in developing computational methods that predict single locations for proteins, assuming that proteins localize to a single location. However, it has been shown that proteins localize to multiple locations. While a few recent systems have attempted to predict multiple locations of proteins, they typically treat locations as independent or capture inter-dependencies by treating each locations-combination present in the training set as an individual location-class. We present a new method and a preliminary system we have developed that directly incorporates inter-dependencies among locations into the multiple-location-prediction process, using a collection of Bayesian network classifiers. We evaluate our system on a dataset of singleand multi-localized proteins. Our results, obtained by incorporating inter-dependencies are significantly higher than those obtained by classifiers that do not use inter-dependencies. The performance of our system on multi-localized proteins is comparable to a top performing system (YLoc), without restricting predictions to be based only on location-combinations present in the training set.

[1]  Wing-Kin Sung,et al.  Improved Algorithms for Constructing Consensus Trees , 2013, SODA.

[2]  F. James Rohlf,et al.  Taxonomic Congruence in the Leptopodomorpha Re-examined , 1981 .

[3]  E. N. Adams Consensus Techniques and the Comparison of Taxonomic Trees , 1972 .

[4]  Bengt Oxelman,et al.  Improvements to resampling measures of group support , 2003 .

[5]  Fred R. McMorris,et al.  Consensusn-trees , 1981 .

[6]  Pablo A. Goloboff,et al.  TNT, a free program for phylogenetic analysis , 2008 .

[7]  Mikkel Thorup,et al.  An O(n log n) algorithm for the maximum agreement subtree problem for binary trees , 1996, SODA '96.

[8]  Jeet Sukumaran,et al.  A justification for reporting the majority-rule consensus tree in Bayesian phylogenetics. , 2008, Systematic biology.

[9]  Wing-Kin Sung,et al.  Constructing the R* Consensus Tree of Two Trees in Subcubic Time , 2012, Algorithmica.

[10]  David Bryant,et al.  A classification of consensus methods for phylogenetics , 2001, Bioconsensus.

[11]  Vincent Moulton,et al.  Inferring polyploid phylogenies from multiply-labeled gene trees , 2009, BMC Evolutionary Biology.

[12]  Nina Amenta,et al.  A Linear-Time Majority Tree Algorithm , 2003, WABI.

[13]  Louis J. Gross Algorithms in Bioinformatics: A Practical Introduction , 2009 .

[14]  K. Bremer COMBINABLE COMPONENT CONSENSUS , 1990, Cladistics : the international journal of the Willi Hennig Society.

[15]  F. McMorris,et al.  The median procedure for n-trees , 1986 .

[16]  Fred R. McMorris,et al.  A Characterization of Majority Rule for Hierarchies , 2008, J. Classif..

[17]  David Fernández-Baca,et al.  Majority-rule (+) consensus trees. , 2010, Mathematical biosciences.

[18]  Mark Wilkinson,et al.  Majority-rule supertrees. , 2007, Systematic biology.

[19]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[20]  David Bryant,et al.  Properties of consensus methods for inferring species trees from gene trees. , 2008, Systematic biology.

[21]  Jeet Sukumaran,et al.  DendroPy: a Python library for phylogenetic computing , 2010, Bioinform..

[22]  Wing-Kin Sung,et al.  An Optimal Algorithm for Building the Majority Rule Consensus Tree , 2013, RECOMB.

[23]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[24]  Wing-Kin Sung,et al.  Polynomial-Time Algorithms for Building a Consensus MUL-Tree , 2012, J. Comput. Biol..