New versions of existing large-scale web services such as Passport.comcopy have to go through rigorous performance evaluations in order to ensure a high degree of availability. Performance testing (such as benchmarking, scalability, and capacity tests) of large-scale stateful systems in managed test environments has many different challenges, mainly related to the reproducibility of production conditions in live data centers. One of these challenges is creating a dataset in a test environment that mimics the actual dataset in production. Other challenges involve the characterization of load patterns in production based on log analysis and proper load simulation via reutilization of data from the existing dataset. The intent of this paper is to describe practical approaches to address some of the aforementioned challenges through the use of various novel techniques. For example, this paper discusses data sanitization, which is the alteration of large datasets in a controlled manner to obfuscate sensitive information, preserving data integrity, relationships, and data equivalence classes. This paper also provides techniques for load pattern characterization via the application of Markov chains to custom and generic logs, as well as general guidelines for the development of cache-based load simulation tools tailored for the performance evaluation of stateful systems.
[1]
Peter Green,et al.
Markov chain Monte Carlo in Practice
,
1996
.
[2]
Peter Norvig,et al.
Artificial Intelligence: A Modern Approach
,
1995
.
[3]
Fabio Casati,et al.
Web service conversation modeling: a cornerstone for e-business automation
,
2004,
IEEE Internet Computing.
[4]
Kevin Mukhar.
Improving Web-Application Performance and Scalability
,
2005
.
[5]
Mark S. Ackerman,et al.
Privacy in e-commerce: examining user scenarios and privacy preferences
,
1999,
EC '99.
[6]
Peter Norvig,et al.
Artificial intelligence - a modern approach, 2nd Edition
,
2003,
Prentice Hall series in artificial intelligence.
[7]
William E. Howden,et al.
Reliability of the Path Analysis Testing Strategy
,
1976,
IEEE Transactions on Software Engineering.
[8]
Dan Suciu,et al.
Data on the Web: From Relations to Semistructured Data and XML
,
1999
.
[9]
Jon Edvardsson,et al.
A Survey on Automatic Test Data Generation
,
2002
.
[10]
Osmar R. Zaïane,et al.
Protecting sensitive knowledge by data sanitization
,
2003,
Third IEEE International Conference on Data Mining.