Combining intrinsic disorder prediction and augmented training of hidden Markov models improves discriminative motif discovery

Abstract Identifying short linear motifs (SLiMs) usually suffers from lack of sufficient sequences. SLiMs with the same functional site class are typically characterized by similar motif patterns, which makes them hard to distinguish by generative motif discovery methods. A discriminative method based on maximal mutual information estimation (MMIE) of hidden Markov models (HMMs) is proposed. It masks ordered regions to improve signal to noise ratio and augments the training set to diminish the impact of the lack of sequences. Experimental results on a dataset selected from the Eukaryotic Linear Motif (ELM) resource show that the proposed method is effective and practical.

[1]  Jörg Schultz,et al.  HMM Logos for visualization of protein families , 2004, BMC Bioinformatics.

[2]  Ziv Bar-Joseph,et al.  Ieee/acm Transactions on Computational Biology and Bioinformatics Discriminative Motif Finding for Predicting Protein Subcellular Localization , 2022 .

[3]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[4]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[5]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[6]  Robert B. Russell,et al.  DILIMOT: discovery of linear motifs in proteins , 2006, Nucleic Acids Res..

[7]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[8]  István Simon,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm035 Structural bioinformatics Local structural disorder imparts plasticity on linear motifs , 2022 .

[9]  A. Tramontano,et al.  Exploiting Publicly Available Biological and Biochemical Information for the Discovery of Novel Short Linear Motifs , 2011, PloS one.

[10]  Timothy L. Bailey,et al.  Discriminative motif discovery in DNA and protein sequences using the DEME algorithm , 2007, BMC Bioinformatics.

[11]  Hong Gu,et al.  Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling , 2014, PloS one.

[12]  Alan M. Moses,et al.  Proteome-Wide Discovery of Evolutionary Conserved Sequences in Disordered Regions , 2012, Science Signaling.

[13]  Norman E. Davey,et al.  Attributes of short linear motifs. , 2012, Molecular bioSystems.

[14]  Ignacio E. Sánchez,et al.  The eukaryotic linear motif resource ELM: 10 years and counting , 2013, Nucleic Acids Res..

[15]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[16]  Anna Tramontano,et al.  Assessment of protein disorder region predictions in CASP10 , 2014, Proteins.

[17]  Seungjin Choi,et al.  Probabilistic Models for Semisupervised Discriminative Motif Discovery in DNA Sequences , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  T. Gibson,et al.  A careful disorderliness in the proteome: Sites for interaction and targets for future therapies , 2008, FEBS letters.

[19]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Lenore Cowen,et al.  Augmented training of hidden Markov models to recognize remote homologs via simulated evolution , 2009, Bioinform..

[21]  Hong Gu,et al.  Discovering short linear protein motif based on selective training of profile hidden Markov models. , 2015, Journal of theoretical biology.

[22]  Richard J. Edwards,et al.  Computational identification and analysis of protein short linear motifs. , 2010, Frontiers in bioscience.

[23]  Emi Tanaka,et al.  Improving MEME via a two-tiered significance analysis , 2014, Bioinform..

[24]  Richard J. Edwards,et al.  Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery , 2009, Bioinform..

[25]  Denis C. Shields,et al.  Profile-based short linear protein motif discovery , 2012, BMC Bioinformatics.