A particle swarm optimization solution for challenging planted(l, d)-Motif problem

In Bioinformatics, Planted (l, d)-Motif finding is an important and challenging problem, which has many applications. Generally, it is to locate recurring patterns in the promoter regions of co-expressed or co-regulated genes. As we can't expect the pattern to be exact matching copies owing to biological mutations, the motif finding turns to be an NP-complete problem. By approximating the same in different aspects, scientists have provided many solutions in the literature. These solutions are either “exact” or “approximate”. All the proposed exact solutions take exponential-time; they need more time to search for larger parameters l and d. The problems of Bioinformatics seldom need the exact optimum solution; rather what they need is robust, fast and near optimal solutions. Therefore, it is impractical to use an exact algorithm to search for large parameters of motifs in real biological dataset. In this paper, we have adopted the features of the Particle Swarm Optimization (PSO) with k-nearest neighbor algorithm to solve the Planted (l, d)-Motif Finding Problem. PSO is a global approximation optimization technique and has wide applications. It finds the global best solution by simply adjusting the trajectory of each individual towards its own best location and towards the best particle of the swarm at each generation. We have performed some experiments on synthetic data by increasing number of sequences and the length of the sequences for different (l, d)-Motifs for the following data sets: general instances (10, 2), (11, 2), (12, 3), (15, 4), (16, 5), (18, 6), (20, 7) (30, 11) and (40,15). Challenging instances: (9, 2), (11, 3), (13, 4), (15, 5), (20, 7), (30, 11), (40, 15) and finally, we have applied our proposed method for real biological sequences. From the experimental results we observe that the proposed algorithm is more efficient and accurate compared to existing approximation algorithms and even it works better for larger motif instances.

[1]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[2]  Yanxin Huang,et al.  Identification of Transcription Factor Binding Sites Using Hybrid Particle Swarm Optimization , 2005, RSFDGrC.

[3]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[4]  Saman K. Halgamuge,et al.  Particle Swarm Optimisation for Protein Motif Discovery , 2004, Genetic Programming and Evolvable Machines.

[5]  Jeremy Buhler,et al.  Finding Motifs Using Random Projections , 2002, J. Comput. Biol..

[6]  Yanwen Li,et al.  Identification of Transcription Factor Binding Sites Using GA and PSO , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[7]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[8]  Eric C. Rouchka,et al.  DNA motif detection using particle swarm optimization and expectation-maximization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..

[9]  Uri Keich,et al.  Finding motifs in the twilight zone , 2002, RECOMB '02.

[10]  Jianhua Ruan,et al.  A particle swarm optimization-based algorithm for finding gapped motifs , 2010, BioData Mining.

[11]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[12]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[13]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[14]  Francis Y. L. Chin,et al.  Voting algorithms for discovering long motifs , 2005, APBC.

[15]  Jianhua Ruan,et al.  A novel swarm intelligence algorithm for finding DNA motifs , 2009, Int. J. Comput. Biol. Drug Des..

[16]  Bin Li,et al.  Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[17]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[18]  Sanguthevar Rajasekaran,et al.  Computational Techniques for Motif Search , 2011, IC3.

[19]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[20]  Andrew D. Smith,et al.  Toward Optimal Motif Enumeration , 2003, WADS.

[21]  U. Srinivasulu Reddy,et al.  Planted (l, d) - Motif Finding using Particle Swarm Optimization , 2010 .

[22]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[23]  Sanguthevar Rajasekaran,et al.  Computational techniques for motif search. , 2009 .

[24]  Kwong-Sak Leung,et al.  TFBS identification based on genetic algorithm with combined representations and adaptive post-processing , 2008, Bioinform..

[25]  Sriram Ramabhadran,et al.  Finding subtle motifs by branching from sample strings , 2003, ECCB.

[26]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[27]  Marie-France Sagot,et al.  RISOTTO: Fast Extraction of Motifs with Mismatches , 2006, LATIN.

[28]  Yan Wang,et al.  A Novel Computational Based Method for Discovery of Sequence Motifs from Coexpressed Genes , 2005 .

[29]  Jaime I. Dávila,et al.  Fast and Practical Algorithms for Planted (l, d) Motif Search , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  Zhi Wei,et al.  GAME: detecting cis-regulatory elements using a genetic algorithm , 2006, Bioinform..