Discovery of DNA Motif Utilising an Integrated Strategy Based on Random Projection and Particle Swarm Optimization

During the process of gene expression and regulation, the DNA genetic information can be transferred to protein by means of transcription. The recognition of transcription factor binding sites can help to understand the evolutionary relations among different sequences. Thus, the problem of recognition of transcription factor binding sites, i.e., motif recognition, plays an important role for understanding the biological functions or meanings of sequences. However, when the established search space processes much noise subsequences, many optimization algorithms tend to be trapped into local optimum. In order to solve this problem, a particle swarm optimization and random projection-based algorithm (PSORPS) is proposed for recognizing DNA motifs. First, a random projection strategy is employed to filter the noise subsequences for constructing the objective space. Moreover, the sequence segments distributed in the majority of DNA sequences can be obtained and used for the population initialization of PSO. Then, the motifs of DNA sequences can be automatically searched by using a designed PSO algorithm in the constructed l-mer objective space. Finally, to alleviate the base deviation and further improve the recognition accuracy, the two operators of associated drift and independent drift are performed on the optimization results obtained by PSO. The experiments are conducted on real-world biological datasets, and the experimental results verify the effectiveness of the proposed algorithm.

[1]  Jianhua Ruan,et al.  A particle swarm optimization-based algorithm for finding gapped motifs , 2010, BioData Mining.

[2]  Miguel A. Vega-Rodríguez,et al.  Multiobjective optimization algorithms for motif discovery in DNA sequences , 2014, Genetic Programming and Evolvable Machines.

[3]  Jagath C Rajapakse,et al.  Graphical approach to weak motif recognition. , 2004, Genome informatics. International Conference on Genome Informatics.

[4]  Miguel A. Vega-Rodríguez,et al.  Hybrid Multiobjective Artificial Bee Colony with Differential Evolution Applied to Motif Finding , 2013, EvoBIO.

[5]  David Jakubec,et al.  Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis , 2016, PloS one.

[6]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[7]  Graziano Pesole,et al.  Motif discovery and transcription factor binding sites before and after the next-generation sequencing era , 2012, Briefings Bioinform..

[8]  Zhihua Cui,et al.  Swarm intelligence in bioinformatics: methods and implementations for discovering patterns of multiple sequences. , 2014, Journal of nanoscience and nanotechnology.

[9]  Sharon Aviran,et al.  Automated Recognition of RNA Structure Motifs by Their SHAPE Data Signatures , 2018, Genes.

[10]  Miguel A. Vega-Rodríguez,et al.  Convergence analysis of some multiobjective evolutionary algorithms when discovering motifs , 2014, Soft Comput..

[11]  Joan Serrà,et al.  Particle swarm optimization for time series motif discovery , 2015, Knowl. Based Syst..

[12]  Michael Arock,et al.  A particle swarm optimization solution for challenging planted(l, d)-Motif problem , 2013, 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[13]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[14]  Eric S. Ho,et al.  iTriplet, a rule-based nucleic acid sequence motif finder , 2009, Algorithms for Molecular Biology.

[15]  Nung Kion Lee,et al.  DeepFinder: An integration of feature-based and deep learning approach for DNA motif discovery , 2018 .

[16]  Jeffrey Scott Vitter,et al.  Reference sequence selection for motif searches , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[17]  Chengpeng Bi,et al.  Tackling the challenging motif problem through hybrid particle swarm optimized alignment clustering , 2011, 2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[18]  Shailendra Asthana,et al.  Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides. , 2018, Gene.

[19]  Pavel Hobza,et al.  Noncovalent Interactions in Specific Recognition Motifs of Protein-DNA Complexes. , 2017, Journal of chemical theory and computation.

[20]  Yun Xu,et al.  An improved voting algorithm for planted (l, d) motif search , 2013, Inf. Sci..

[21]  Tommi Hirvola,et al.  A graph-theoretical approach for motif discovery in protein sequences. , 2015, IEEE/ACM transactions on computational biology and bioinformatics.

[22]  Mostafa M. Abbas,et al.  An Efficient Algorithm to Identify DNA Motifs , 2013, Math. Comput. Sci..

[23]  K. Gaus,et al.  Binding of transcription factor GabR to DNA requires recognition of DNA shape at a location distinct from its cognate binding site , 2015, Nucleic acids research.

[24]  Wen-Jing Hsu,et al.  Tree-structured algorithm for long weak motif discovery , 2011, Bioinform..

[25]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[26]  Sanguthevar Rajasekaran,et al.  qPMS9: An Efficient Algorithm for Quorum Planted Motif Search , 2015, Scientific Reports.

[27]  Qiang Yu,et al.  PairMotif+: A Fast and Effective Algorithm for De Novo Motif Discovery in DNA sequences , 2013, International journal of biological sciences.

[28]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[29]  Wen-Jing Hsu,et al.  RecMotif: a novel fast algorithm for weak motif discovery , 2010, BMC Bioinformatics.

[30]  Kah Wai Lim,et al.  A Dual-Specific Targeting Approach Based on the Simultaneous Recognition of Duplex and Quadruplex Motifs , 2017, Scientific Reports.