Mining DNA Sequence Patterns with Constraints Using Hybridization of Firefly and Group Search Optimization

Abstract DNA sequence mining is essential in the study of the structure and function of the DNA sequence. A few exploration works have been published in the literature concerning sequence mining in information mining task. Similarly, in our past paper, an effective sequence mining was performed on a DNA database utilizing constraint measures and group search optimization (GSO). In that study, GSO calculation was utilized to optimize the sequence extraction process from a given DNA database. However, it is apparent that, occasionally, such an arbitrary seeking system does not accompany the optimal solution in the given time. To overcome the problem, we proposed in this work multiple constraints with hybrid firefly and GSO (HFGSO) algorithm. The complete DNA sequence mining process comprised the following three modules: (i) applying prefix span algorithm; (ii) calculating the length, width, and regular expression (RE) constraints; and (iii) optimal mining via HFGSO. First, we apply the concept of prefix span, which detects the frequent DNA sequence pattern using a prefix tree. Based on this prefix tree, length, width, and RE constraints are applied to handle restrictions. Finally, we adopt the HFGSO algorithm for the completeness of the mining result. The experimentation is carried out on the standard DNA sequence dataset, and the evaluation with DNA sequence dataset and the results show that our approach is better than the existing approach.

[1]  Rajesh Kaluri,et al.  An enhanced algorithm for frequent pattern mining frombiological sequences , 2016 .

[2]  I-En Liao,et al.  A frequent itemset mining algorithm based on the Principle of Inclusion-Exclusion and transaction mapping , 2014, Inf. Sci..

[3]  Xin-She Yang,et al.  Firefly Algorithm: Recent Advances and Applications , 2013, ArXiv.

[4]  Zhihua Cui,et al.  Monarch butterfly optimization , 2015, Neural Computing and Applications.

[5]  Neelu Khare,et al.  FDSMO: Frequent DNA Sequence Mining Using FBSB and Optimization , 2016 .

[6]  Sushmita Mitra,et al.  Data Mining: Concepts and Algorithms From Multimedia to Bioinformatics , 2003 .

[7]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[8]  Q. Henry Wu,et al.  A Group Search Optimizer for Neural Network Training , 2006, ICCSA.

[9]  Keun Ho Ryu,et al.  Application of Gap-Constraints Given Sequential Frequent Pattern Mining for Protein Function Prediction , 2015, Osong public health and research perspectives.

[10]  Neelu Khare,et al.  Constraint-Based Measures for DNA Sequence Mining using Group Search Optimization Algorithm , 2016 .

[11]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[12]  Amir Hossein Gandomi,et al.  Chaotic Krill Herd algorithm , 2014, Inf. Sci..

[13]  Xin-She Yang,et al.  A new hybrid method based on krill herd and cuckoo search for global optimisation tasks , 2016, Int. J. Bio Inspired Comput..

[14]  D. Binu,et al.  An approach to products placement in supermarkets using PrefixSpan algorithm , 2013, J. King Saud Univ. Comput. Inf. Sci..

[15]  Amir Hossein Gandomi,et al.  A new hybrid method based on krill herd and cuckoo search for global optimisation tasks , 2016, Int. J. Bio Inspired Comput..

[16]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[17]  Amir Hossein Gandomi,et al.  Stud krill herd algorithm , 2014, Neurocomputing.

[18]  Unil Yun WIS: Weighted Interesting Sequential Pattern Mining with a Similar Level of Support and/or Weight , 2007 .

[19]  Klaus Julisch,et al.  Data Mining for Intrusion Detection , 2002, Applications of Data Mining in Computer Security.

[20]  Gai-Ge Wang,et al.  A New Improved Firefly Algorithm for Global Numerical Optimization , 2014 .

[21]  Gary Montague,et al.  Genetic programming: an introduction and survey of applications , 1997 .

[22]  Shoon Lei Win,et al.  Recognition of Promoters in DNA Sequences Using Weightily Averaged One-dependence Estimators , 2013 .

[23]  Dipak R. Kawade,et al.  Frequent Sequential Pattern Mining With Weighted Regular Expression and Length Constraint , 2015 .

[24]  Tzung-Pei Hong,et al.  Efficient algorithms for mining up-to-date high-utility patterns , 2015, Adv. Eng. Informatics.

[25]  Mr. Dipak R. Kawade,et al.  Exploration of DNA Sequences Using Pattern Mining , 2013 .

[26]  Xindong Wu,et al.  PMBC: Pattern mining from biological sequences with wildcard constraints , 2013, Comput. Biol. Medicine.

[27]  Sanjay Garg,et al.  Modified web access pattern (mWAP) approach for sequential pattern mining , 2015 .

[28]  Amor Lazzez,et al.  Efficient Analysis of Pattern and Association Rule Mining Approaches , 2014, ArXiv.

[29]  Keun Ho Ryu,et al.  Discovering Important Sequential Patterns with Length-Decreasing Weighted Support Constraints , 2010, Int. J. Inf. Technol. Decis. Mak..

[30]  M. Teisseire,et al.  Efficient mining of sequential patterns with time constraints: Reducing the combinations , 2009, Expert Syst. Appl..

[31]  Leandro dos Santos Coelho,et al.  Earthworm optimisation algorithm: a bio-inspired metaheuristic algorithm for global optimisation problems , 2018, Int. J. Bio Inspired Comput..

[32]  P. S. Grover,et al.  Constraint-based sequential pattern mining: a pattern growth algorithm incorporating compactness, length and monetary , 2014, Int. Arab J. Inf. Technol..

[33]  Amir Hossein Gandomi,et al.  Chaotic cuckoo search , 2015, Soft Computing.

[34]  Ranjit Biswas,et al.  YAMI: Incremental Mining of Interesting Association Patterns , 2012, Int. Arab J. Inf. Technol..

[35]  Lin Ya-ping,et al.  Gene cluster algorithm based on most similarity tree , 2005, Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05).

[36]  Atsuyoshi Nakamura,et al.  Mining approximate patterns with frequent locally optimal occurrences , 2016, Discret. Appl. Math..

[37]  Wei Liu,et al.  Frequent patterns mining in multiple biological sequences , 2013, Comput. Biol. Medicine.