FACC: A Novel Finite Automaton Based on Cloud Computing for the Multiple Longest Common Subsequences Search

Searching for the multiple longest common subsequences (MLCS) has significant applications in the areas of bioinformatics, information processing, and data mining, and so forth, Although a few parallel MLCS algorithms have been proposed, the efficiency and effectiveness of the algorithms are not satisfactory with the increasing complexity and size of biologic data. To overcome the shortcomings of the existing MLCS algorithms, and considering that MapReduce parallel framework of cloud computing being a promising technology for cost-effective high performance parallel computing, a novel finite automaton (FA) based on cloud computing called FACC is proposed under MapReduce parallel framework, so as to exploit a more efficient and effective general parallel MLCS algorithm. FACC adopts the ideas of matched pairs and finite automaton by preprocessing sequences, constructing successor tables, and common subsequences finite automaton to search for MLCS. Simulation experiments on a set of benchmarks from both real DNA and amino acid sequences have been conducted and the results show that the proposed FACC algorithm outperforms the current leading parallel MLCS algorithm FAST-MLCS.

[1]  Jeffrey Dean,et al.  Keynote talk: Experiences with MapReduce, an abstraction for large-scale computation , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[2]  Wei Liu,et al.  A Fast Longest Common Subsequence Algorithm for Biosequences Alignment , 2007, CCTA.

[3]  Gerth Stølting Brodal,et al.  Faster Algorithms for Computing Longest Common Increasing Subsequences , 2005 .

[4]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[5]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[6]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[7]  Qingguo Wang,et al.  A Fast Multiple Longest Common Subsequence (MLCS) Algorithm , 2011, IEEE Transactions on Knowledge and Data Engineering.

[8]  Qingguo Wang,et al.  An Efficient Parallel Algorithm for the Multiple Longest Common Subsequence (MLCS) Problem , 2008, 2008 37th International Conference on Parallel Processing.

[9]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[10]  Pong C. Yuen,et al.  Kernel machine-based rank-lifting regularized discriminant analysis method for face recognition , 2011, Neurocomputing.

[11]  Andreas D. Baxevanis,et al.  Bioinformatics - a practical guide to the analysis of genes and proteins , 2001, Methods of biochemical analysis.

[12]  Sergey Bereg,et al.  Enumerating longest increasing subsequences and patience sorting , 2000, Inf. Process. Lett..

[13]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[14]  Alok Aggarwal,et al.  Notes on searching in multidimensional monotone arrays , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[15]  Michael L. Fredman,et al.  On computing the length of longest increasing subsequences , 1975, Discret. Math..

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Joel H. Saltz,et al.  Parallel processing of biological sequence comparison algorithms , 1988, International Journal of Parallel Programming.

[18]  Mikhail J. Atallah,et al.  Efficient Parallel Algorithms for String Editing and Related Problems , 1990, SIAM J. Comput..

[19]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[20]  Alessandro Bogliolo,et al.  Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism , 2004, Inf. Process. Lett..

[21]  Jiaoyun Yang,et al.  An Efficient Parallel Algorithm for Longest Common Subsequence Problem on GPUs , 2010 .

[22]  Tim Wright,et al.  Cloud Computing: Web-Based Applications That Change the Way You Work and Collaborate Online , 2009 .

[23]  Artem Cherkasov,et al.  Bioinformatics: A practical guide to the analysis of genes and proteins , 2005 .

[24]  Kun-Mao Chao,et al.  A fast algorithm for computing a longest common increasing subsequence , 2005, Inf. Process. Lett..