Generalized Suffix Tree Based Multiple Sequence Alignment for Service Virtualization

Assuring quality of contemporary software systems is a very challenging task due to the often large complexity of the deployment environments in which they will operate. Service virtualization is an approach to this challenge where services within the deployment environment are emulated by synthesising service response messages from models or by recording and then replaying service interaction messages with the system. Record-and-replay techniques require an approach where (i) message prototypes can be derived from recorded system interactions (i.e. Request-response sequences), (ii) a scheme to match incoming request messages against message prototypes, and (iii) the synthesis of response messages based on similarities between incoming messages and the recorded system interactions. Previous approaches in service virtualization have required a multiple sequence alignment (MSA) algorithm as a means of finding common patterns of similarities and differences between messages required by all three steps. In this paper, we present a novel MSA algorithm based on Generalized Suffix Trees (GSTs). We evaluated the accuracy and efficiency of the proposed algorithm against six enterprise service message trace datasets, with the proposed algorithm performing up to 50 times faster than standard MSA approaches. Furthermore, the algorithm has applicability to other domains beyond service virtualization.

[1]  D. Box,et al.  Simple object access protocol (SOAP) 1.1 , 2000 .

[2]  Eugene W. Myers,et al.  Chaining multiple-alignment fragments in sub-quadratic time , 1995, SODA '95.

[3]  Lucas Chi Kwong Hui,et al.  Color Set Size Problem with Application to String Matching , 1992, CPM.

[4]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[5]  Yong Tang,et al.  Generating Simplified Regular Expression Signatures for Polymorphic Worms , 2007, ATC.

[6]  Bruce W. Watson A new algorithm for the construction of minimal acyclic DFAs , 2003, Sci. Comput. Program..

[7]  Jim Sermersheim,et al.  Lightweight Directory Access Protocol (LDAP): The Protocol , 2006, RFC.

[8]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[9]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Chet Langin,et al.  Languages and Machines: An Introduction to the Theory of Computer Science , 2007 .

[11]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[12]  John Riedl,et al.  Generalized suffix trees for biological sequence data: applications and implementation , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[13]  Jun Han,et al.  Scalable Emulation of Enterprise Systems , 2009, 2009 Australian Software Engineering Conference.

[14]  Z. Galil,et al.  Pattern matching algorithms , 1997 .

[15]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[16]  Monika Richter Languages And Machines An Introduction To The Theory Of Computer Science , 2016 .

[17]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[18]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[19]  Thomas Sudkamp Languages and Machines: An Introduction to the Theory of Computer Science , 2005 .

[20]  John C. Grundy,et al.  Interaction Traces Mining for Efficient System Responses Generation , 2015, SOEN.

[21]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[22]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[23]  Enno Ohlebusch,et al.  Efficient multiple genome alignment , 2002, ISMB.

[24]  John Grundy,et al.  Generating service models by trace subsequence substitution , 2013, QoSA '13.

[25]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[26]  Beng-Hong Lim,et al.  Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor , 2001, USENIX Annual Technical Conference, General Track.