Core Column Prediction for Alignments

In a computed multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the unknown reference alignment of the sequences, where the core columns of the reference alignment are those that are reliably correct. In the absence of knowing the reference alignment, the coreness of a column can only be estimated. This chapter describes the first method for estimating column coreness for protein multiple sequence alignments.

[1]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[2]  Toni Gabaldón,et al.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses , 2009, Bioinform..

[3]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[4]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[5]  Innes C Cuthill,et al.  The influence of a hot environment on parental cooperation of a ground-nesting shorebird, the Kentish plover Charadrius alexandrinus , 2010, Frontiers in Zoology.

[6]  John D. Kececioglu,et al.  Multiple alignment by aligning alignments , 2007, ISMB/ECCB.

[7]  John D. Kececioglu,et al.  Core column prediction for protein multiple sequence alignments , 2017, Algorithms for Molecular Biology.

[8]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[9]  P. Culligan,et al.  Correction: Increase in Diarrheal Disease Associated with Arsenic Mitigation in Bangladesh , 2012, PLoS ONE.

[10]  Tal Pupko,et al.  GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters , 2015, Nucleic Acids Res..

[11]  Wei Qian,et al.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. , 2000, Molecular biology and evolution.

[12]  John D. Kececioglu,et al.  Predicting Core Columns of Protein Multiple Sequence Alignments for Improved Parameter Advising , 2016, WABI.

[13]  Jonathan A. Eisen,et al.  Accounting For Alignment Uncertainty in Phylogenomics , 2012, PloS one.

[14]  Paolo Di Tommaso,et al.  TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. , 2014, Molecular biology and evolution.

[15]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[16]  Patrick Kück,et al.  Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees , 2010, Frontiers in Zoology.

[17]  Simon C. Potter,et al.  A Genome-Wide Association Search for Type 2 Diabetes Genes in African Americans , 2012, PLoS ONE.

[18]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Cristina Dutra de Aguiar Ciferri,et al.  Generalized enhanced suffix array construction in external memory , 2017, Algorithms for Molecular Biology.

[20]  Stefan Grünewald,et al.  Noisy: Identification of problematic columns in multiple sequence alignments , 2008, Algorithms for Molecular Biology.