A modified Henry gas solubility optimization for solving motif discovery problem

The DNA motif discovery (MD) problem is the main challenge of genome biology, and its importance is directly proportional to increasing sequencing technologies. MD plays a vital role in the identification of transcription factor binding sites that help in learning the mechanisms for regulation of gene expression. Metaheuristic algorithms are promising techniques for eliciting motif from DNA genomic sequences, but often fail to demonstrate robust performance by overcoming the inherent challenges in complex gene sequences, making search environment extremely non-convex for optimization methods. This paper proposes a novel modified Henry gas solubility optimization (MHGSO) algorithm for motif discovery which elicits a functional motif in DNA genomic sequences. In our approach, a new stage that captures the main characteristics of the motifs in DNA sequences is proposed, and MHGSO imitates the motifs characteristics for accurate detection of target motif. The performance of the MHGSO algorithm is validated using both synthetic and real datasets. Results confirm the stability and superiority of the proposed algorithm compared to state-of-the-art algorithms including MEME, DREME, XXmotif, PMbPSO, and MACS. Based on several evaluation matrices, MHGSO outperforms the competitor techniques in terms of nucleotide-level correlation coefficient, recall, precision, F -score, Cohen’s Kappa, and statistical validation measures.

[1]  Jaime I. Dávila,et al.  Fast and Practical Algorithms for Planted (l, d) Motif Search , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  J. van Helden,et al.  RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets , 2011, Nucleic acids research.

[3]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[4]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[5]  A. Sharov,et al.  Exhaustive Search for Over-represented DNA Sequence Motifs with CisFinder , 2009, DNA research : an international journal for rapid publication of reports on genes and genomes.

[6]  Rong-Ming Chen,et al.  FMGA: finding motifs by genetic algorithm , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[7]  Mai S. Mabrouk,et al.  Adaptation of cuckoo search algorithm for the Motif Finding problem , 2014, 2014 10th International Computer Engineering Conference (ICENCO).

[8]  Christoph Adami,et al.  Information theory in molecular biology , 2004, q-bio/0405004.

[9]  Qiang Yu,et al.  PairMotif: A New Pattern-Driven Algorithm for Planted (l, d) DNA Motif Search , 2012, PloS one.

[10]  Pavel A. Pevzner,et al.  Combinatorial Approaches to Finding Subtle Signals in DNA Sequences , 2000, ISMB.

[11]  Seyedali Mirjalili,et al.  Henry gas solubility optimization: A novel physics-based algorithm , 2019, Future Gener. Comput. Syst..

[12]  Hanina Hibshoosh,et al.  Modulation of ErbB2 Blockade in ErbB2-Positive Cancers: The Role of ErbB2 Mutations and PHLDA1 , 2014, PloS one.

[13]  Yael Mandel-Gutfreund,et al.  DRIMust: a web server for discovering rank imbalanced motifs using suffix trees , 2013, Nucleic Acids Res..

[14]  Dianhui Wang,et al.  A comprehensive survey on genetic algorithms for DNA motif prediction , 2018, Inf. Sci..

[15]  Xiaoyan Zhao,et al.  Improved Pattern-Driven Algorithms for Motif Finding in DNA Sequences , 2005, Systems Biology and Regulatory Genomics.

[16]  H. Vaziri,et al.  A multi-objective imperialist competitive algorithm (MOICA) for finding motifs in DNA sequences. , 2019, Mathematical biosciences and engineering : MBE.

[17]  Timothy L. Bailey,et al.  Gene expression Advance Access publication May 4, 2011 DREME: motif discovery in transcription factor ChIP-seq data , 2011 .

[18]  Johannes Söding,et al.  Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences , 2016, bioRxiv.

[19]  Jianhua Ruan,et al.  A novel swarm intelligence algorithm for finding DNA motifs , 2009, Int. J. Comput. Biol. Drug Des..

[20]  M Bertelli,et al.  Prevalence of mutations in LEP, LEPR, and MC4R genes in individuals with severe obesity. , 2016, Genetics and molecular research : GMR.

[21]  Yuehui Chen,et al.  Bacterial Foraging Optimization Algorithm Integrating Tabu Search for Motif Discovery , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[22]  Patrick Siarry,et al.  A survey on optimization metaheuristics , 2013, Inf. Sci..

[23]  Ping Wang,et al.  An Entropy-Based Position Projection Algorithm for Motif Discovery , 2016, BioMed research international.

[24]  Kathleen Marchal,et al.  A Gibbs sampling method to detect over-represented motifs in the upstream regions of co-expressed genes , 2001, RECOMB.

[25]  Amar Mukherjee,et al.  New Algorithms for Finding Monad Patterns in DNA Sequences , 2004, SPIRE.

[26]  Miguel A. Vega-Rodríguez,et al.  Hybrid Multiobjective Artificial Bee Colony with Differential Evolution Applied to Motif Finding , 2013, EvoBIO.

[27]  Abdellah Boukerram,et al.  Motif Finding Using Ant Colony Optimization , 2010, ANTS Conference.

[28]  Carolyn J. Mattingly,et al.  Preliminary Results for GAMI: A Genetic Algorithms Approach to Motif Inference , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[29]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[30]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[31]  Charles Elkan,et al.  The Value of Prior Knowledge in Discovering Motifs with MEME , 1995, ISMB.

[32]  U. Srinivasulu Reddy,et al.  Planted (l, d) - Motif Finding using Particle Swarm Optimization , 2010 .

[33]  Shoudan Liang,et al.  cWINNOWER algorithm for finding fuzzy DNA motifs , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[34]  Caiyan Jia,et al.  A New Exhaustive Method and Strategy for Finding Motifs in ChIP-Enriched Regions , 2014, PloS one.

[35]  Enrique Blanco,et al.  ABS: a database of Annotated regulatory Binding Sites from orthologous promoters , 2005, Nucleic Acids Res..

[36]  Johanne Cohen,et al.  Shuffling biological sequences with motif constraints , 2008, J. Discrete Algorithms.

[37]  Graziano Pesole,et al.  Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes , 2004, Nucleic Acids Res..

[38]  Shane T. Jensen,et al.  Computational Discovery of Gene Regulatory Binding Motifs: A Bayesian Perspective , 2004 .

[39]  Takeo Kanade,et al.  Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics , 2013, Lecture Notes in Computer Science.

[40]  Sanguthevar Rajasekaran,et al.  qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences , 2012, PloS one.

[41]  Michael Q. Zhang,et al.  A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information , 2011, Nucleic acids research.

[42]  Sanjay Kumar,et al.  DSAPSO: DNA sequence assembly using continuous Particle Swarm Optimization with Smallest Position Value rule , 2012, 2012 1st International Conference on Recent Advances in Information Technology (RAIT).

[43]  J. Söding,et al.  P-value-based regulatory motif discovery using positional weight matrices , 2013, Genome research.

[44]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[45]  John E. Reid,et al.  STEME: efficient EM to find motifs in large data sets , 2011, Nucleic acids research.

[46]  Saurabh Sinha,et al.  YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation , 2003, Nucleic Acids Res..

[47]  Maulika S. Patel,et al.  Motif Finding with Application to the Transcription Factor Binding Sites Problem , 2015 .

[48]  Sanguthevar Rajasekaran,et al.  Space and Time Efficient Algorithms for Planted Motif Search , 2006, International Conference on Computational Science.

[49]  Wei Wu,et al.  LOGOS: a modular Bayesian model for de novo motif detection , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[50]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[51]  Victor O. K. Li,et al.  Chemical-Reaction-Inspired Metaheuristic for Optimization , 2010, IEEE Transactions on Evolutionary Computation.

[52]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[53]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[54]  Sanguthevar Rajasekaran,et al.  Exact Algorithms for Planted Motif Problems , 2005, J. Comput. Biol..

[55]  Prudence W. H. Wong,et al.  Finding DNA Regulatory Motifs with Position-dependent Models , 2013 .

[56]  Miguel A. Vega-Rodríguez,et al.  Comparing Multiobjective Artificial Bee Colony Adaptations for Discovering DNA Motifs , 2012, EvoBIO.

[57]  Jeffrey Scott Vitter,et al.  An Efficient Algorithm for Discovering Motifs in Large DNA Data Sets , 2015, IEEE Transactions on NanoBioscience.

[58]  Khaled Rasheed,et al.  MDGA: motif discovery using a genetic algorithm , 2005, GECCO '05.

[59]  Xiaohui Xie,et al.  EXTREME: an online EM algorithm for motif discovery , 2014, Bioinform..

[60]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[61]  Jagath C. Rajapakse,et al.  ListMotif: A time and memory efficient algorithm for weak motif discovery , 2010, 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering.

[62]  Warangkhana Kimpan,et al.  Enhancing of Particle Swarm Optimization Based Method for Multiple Motifs Detection in DNA Sequences Collections , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[63]  George Varghese,et al.  A uniform projection method for motif discovery in DNA sequences , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[64]  Jeremy Buhler,et al.  Finding Motifs Using Random Projections , 2002, J. Comput. Biol..

[65]  Walid Al-Atabany,et al.  Review of Different Sequence Motif Finding Algorithms , 2019, Avicenna journal of medical biotechnology.

[66]  D Karaboga,et al.  A discrete artificial bee colony algorithm for detecting transcription factor binding sites in DNA sequences. , 2016, Genetics and molecular research : GMR.

[67]  Walid Al-Atabany,et al.  GWOMF: Grey Wolf Optimization for motif finding , 2017, 2017 13th International Computer Engineering Conference (ICENCO).