Stratified sampling of execution traces: Execution phases serving as strata

The understanding of the behavioral aspects of a software system is an important enabler for many reverse engineering activities. The behavior of software is typically represented in the form of execution traces. Traces, however, can be overwhelmingly large. To reduce their size, sampling techniques, especially the ones based on random sampling, have been extensively used. Random sampling, however, may result in samples that are not representative of the original trace. In this paper, we propose a trace sampling technique that not only reduces the size of a trace but also results in a sample that is representative of the original trace by ensuring that the desired characteristics of an execution are distributed similarly in both the sampled and the original trace. Hence, the insights gained from analyzing the sample trace could be extrapolated to the original execution trace. Our approach is based on stratified sampling instead of random sampling and uses the concept of execution phases as strata. We define an execution phase as a part of a trace that represents a specific task of the traced system. We also present an approach for the automatic detection of execution phases from a trace. Finally, we show the effectiveness of our sampling technique through two case studies.

[1]  O. Reiser,et al.  Principles Of Gestalt Psychology , 1936 .

[2]  Tarja Systä,et al.  Understanding the Behavior of Java Programs , 2000, WCRE.

[3]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[4]  Abdelwahab Hamou-Lhadj,et al.  Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[5]  Abdelwahab Hamou-Lhadj,et al.  A Novel Approach Based on Gestalt Psychology for Abstracting the Content of Large Execution Traces for Program Comprehension , 2011, 2011 16th IEEE International Conference on Engineering of Complex Computer Systems.

[6]  Richard C. Holt,et al.  Software botryology. Automatic clustering of software systems , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[7]  Gail C. Murphy,et al.  Scaling an object-oriented system execution visualizer through sampling , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[8]  Arie van Deursen,et al.  Understanding Execution Traces Using Massive Sequence and Circular Bundle Views , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[9]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[11]  Ali Shokoufandeh,et al.  Reducing Program Comprehension Effort in Evolving Software by Recognizing Feature Implementation Convergence , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[12]  Yann-Gaël Guéhéneuc,et al.  Feature identification: a novel approach and a case study , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[13]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[14]  Steven P. Reiss,et al.  Visual representations of executing programs , 2007, J. Vis. Lang. Comput..

[15]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[16]  Sergei Vassilvitskii,et al.  K-means: algorithms, analyses, experiments , 2007 .

[17]  Abdelwahab Hamou-Lhadj,et al.  A survey of trace exploration tools and techniques , 2004, CASCON.

[18]  Abdelwahab Hamou-Lhadj,et al.  An Approach for Mapping Features to Code Based on Static and Dynamic Analysis , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[19]  Mercedes M. Fisher,et al.  Gestalt Theory: A Foundation for Instructional Screen Design , 1999 .

[20]  Abdelwahab Hamou-Lhadj,et al.  A software behaviour analysis framework based on the human perception systems: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[21]  Steven P. Reiss Dynamic detection and visualization of software phases , 2005, WODA '05.

[22]  Abdelwahab Hamou-Lhadj,et al.  An Approach for Detecting Execution Phases of a System for the Purpose of Program Comprehension , 2010, 2010 Eighth ACIS International Conference on Software Engineering Research, Management and Applications.

[23]  Abdelwahab Hamou-Lhadj,et al.  The Concept of Stratified Sampling of Execution Traces , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[24]  Václav Rajlich,et al.  Case study of feature location using dependence graph , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[25]  Katsuro Inoue,et al.  Feature-level phase detection for execution trace using object cache , 2008, WODA '08.

[26]  Rainer Koschke,et al.  Locating Features in Source Code , 2003, IEEE Trans. Software Eng..

[27]  P. Harwood Michael , 1985 .

[28]  Steven P. Reiss,et al.  Dynamic detection and visualization of software phases , 2005, ACM SIGSOFT Softw. Eng. Notes.

[29]  James Roberts TraceVis: An Execution Trace Visualization Tool , 2004 .

[30]  A. Winsor Sampling techniques. , 2000, Nursing times.

[31]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[32]  Paul C Quinn,et al.  Perceptual Organization in Infancy: Bottom-Up and Top-Down Influences , 2009, Optometry and vision science : official publication of the American Academy of Optometry.

[33]  Swapna S. Gokhale,et al.  Quantifying the closeness between program components and features , 2000, J. Syst. Softw..

[34]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[35]  Ted J. Biggerstaff,et al.  Program understanding and the concept assignment problem , 1994, CACM.

[36]  Jeffrey S. Perry,et al.  Edge co-occurrence in natural images predicts contour grouping performance , 2001, Vision Research.

[37]  Adrian Kuhn,et al.  Exploiting the Analogy Between Traces and Signal Processing , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[38]  Wei Wang,et al.  Modeling program behaviors by hidden Markov models for intrusion detection , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[39]  Yvan Saeys,et al.  Java-ML: A Machine Learning Library , 2009, J. Mach. Learn. Res..

[40]  Daniel Amyot,et al.  Recovering behavioral design models from execution traces , 2005, Ninth European Conference on Software Maintenance and Reengineering.

[41]  Philippe Dugerdil,et al.  Using trace sampling techniques to identify dynamic clusters of classes , 2007, CASCON.

[42]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[43]  Steven P. Reiss,et al.  The Bloom Software Visualziation System , 2003 .

[44]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[45]  Michael Schatz,et al.  A Real-Time Intrusion Detection System Based on Learning Program Behavior , 2000, Recent Advances in Intrusion Detection.

[46]  Steven P. Reiss,et al.  The BLOOM Software Visualization System , 2002 .

[47]  John T. Stasko,et al.  Visualizing Interactions in Program Executions , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[48]  William G. Zikmund,et al.  Exploring Marketing Research , 1986 .

[49]  Andrew David Eisenberg,et al.  Dynamic feature traces: finding features in unfamiliar code , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[50]  Norman Wilde,et al.  Software reconnaissance: Mapping program features to code , 1995, J. Softw. Maintenance Res. Pract..

[51]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[52]  Andy Zaidman,et al.  Scalability solutions for program comprehension through dynamic analysis , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).