A Comparative Study of Vectorization-Based Static Test Case Prioritization Methods

To enhance the efficiency of software testing, researchers have studied various test case prioritization (TCP) methods. A topic model-based TCP is one of the promising methods, which expresses test cases by topic vectors and prioritizes them in the order such that the set of already-prioritized test cases have the maximum dispersion in the vector space. However, the topic model is not the only option available for vectorizing test cases. Moreover, the distance metric in the vector space and the scheme to prioritize test cases (the way to find the test case that is the farthest from the set of already-prioritized ones) also have some available options. Because the combinations of the above options have not been well-discussed in the past, this paper conducts a comparative study of 36 TCP methods, which are the combinations of (1) three vectorization methods, (2) three distance metrics, and (3) four prioritization schemes (36=3x3x4). The empirical results show the following findings. The choice of the vectorization method has a significant impact on the testing efficiency: a promising option is Doc2Vec (PVDBoW). The combination with the distance metric may also be impactful: a useful combination is Doc2Vec (PV-DBoW) and Euclidean distance. The third aspect, i.e., the choice of the scheme to find the farthest test case, is not always influential.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Nadine Mandran,et al.  Prioritizing test cases with string distances , 2011, Automated Software Engineering.

[3]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[4]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[5]  Lionel C. Briand,et al.  Empirical Investigation of the Effects of Test Suite Properties on Similarity-Based Test Case Selection , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[6]  Emily Hill,et al.  An empirical study of identifier splitting techniques , 2014, Empirical Software Engineering.

[7]  Mark Harman,et al.  Regression testing minimization, selection and prioritization: a survey , 2012, Softw. Test. Verification Reliab..

[8]  Adam A. Porter,et al.  A history-based test prioritization technique for regression testing in resource constrained environments , 2002, ICSE '02.

[9]  Eugene Miya,et al.  On "Software engineering" , 1985, SOEN.

[10]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Lionel C. Briand,et al.  An enhanced test case selection approach for model-based testing: an industrial case study , 2010, FSE '10.

[13]  Ahmed E. Hassan,et al.  Static test case prioritization using topic models , 2014, Empirical Software Engineering.

[14]  Om Prakash Sangwan,et al.  A Systematic Literature Review of Test Case Prioritization Using Genetic Algorithms , 2019, IEEE Access.

[15]  Joseph Robert Horgan,et al.  Effect of Test Set Minimization on Fault Detection Effectiveness , 1995, 1995 17th International Conference on Software Engineering.

[16]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[17]  Gregg Rothermel,et al.  Test Case Prioritization: A Family of Empirical Studies , 2002, IEEE Trans. Software Eng..

[18]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[19]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[20]  Anne M. Denton,et al.  A clustering approach to improving test case prioritization: An industrial case study , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[21]  Jean Zoren Werner Hartmann,et al.  Techniques for selective revalidation , 1990, IEEE Software.

[22]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[23]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[24]  Neelam Gupta,et al.  Test Case Prioritization Using Relevant Slices , 2006, 30th Annual International Computer Software and Applications Conference (COMPSAC'06).

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Lu Zhang,et al.  Prioritizing JUnit test cases in absence of coverage information , 2009, 2009 IEEE International Conference on Software Maintenance.

[27]  Gregg Rothermel,et al.  Prioritizing test cases for regression testing , 2000, ISSTA '00.