Anvaya: An Algorithm and Case-Study on Improving the Goodness of Software Process Models Generated by Mining Event-Log Data in Issue Tracking Systems

Issue Tracking Systems (ITS) such as Bugzilla can be viewed as Process Aware Information Systems (PAIS) generating event-logs during the life-cycle of a bug report. Process Mining consists of mining event logs generated from PAIS for process model discovery, conformance and enhancement. We apply process map discovery techniques to mine event trace data generated from ITS of open source Firefox browser project to generate and study process models. Bug life-cycle consists of diversity and variance. Therefore, the process models generated from the event-logs are spaghetti-like with large number of edges, inter-connections and nodes. Such models are complex to analyse and difficult to comprehend by a process analyst. We improve the Goodness (fitness and structural complexity) of the process models by splitting the event-log into homogeneous subsets by clustering structurally similar traces. We adapt the K-Medoid clustering algorithm with two different distance metrics: Longest Common Subsequence (LCS) and Dynamic Time Warping (DTW). We evaluate the goodness of the process models generated from the clusters using complexity and fitness metrics. We study back-forth and self-loops, bug reopening, and bottleneck in the clusters obtained and show that clustering enables better analysis. We also propose an algorithm to automate the clustering process-the algorithm takes as input the event log and returns the best cluster set.

[1]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[2]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[3]  Wil M.P. van der Aalst Process Mining: Overview and Opportunities , 2012, TMIS.

[4]  Wil M. P. van der Aalst,et al.  Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics , 2007, BPM.

[5]  Wil M. P. van der Aalst,et al.  Conformance Testing: Measuring the Fit and Appropriateness of Event Logs and Process Models , 2005, Business Process Management Workshops.

[6]  Jan Mendling,et al.  A Discourse on Complexity of Process Models , 2006, Business Process Management Workshops.

[7]  Ashish Sureka,et al.  Nirikshan: mining bug report history for discovering process maps, inefficiencies and inconsistencies , 2014, ISEC '14.

[8]  Atul Kumar,et al.  Ahaan: Software Process Intelligence: Mining Software Process Data for Extracting Actionable Information , 2015, ISEC.

[9]  Wil M. P. van der Aalst,et al.  Process mining: making knowledge discovery process centric , 2012, SKDD.

[10]  Rafael Accorsi,et al.  On the exploitation of process mining for security audits: the conformance checking case , 2012, SAC '12.

[11]  Boudewijn F. van Dongen,et al.  Process Mining Based on Clustering: A Quest for Precision , 2007, Business Process Management Workshops.

[12]  Wil M. P. van der Aalst,et al.  Trace Clustering in Process Mining , 2008, Business Process Management Workshops.

[13]  Remco M. Dijkman,et al.  Semantics and analysis of business process models in BPMN , 2008, Inf. Softw. Technol..

[14]  Marielba Zacarias,et al.  Approaching Process Mining with Sequence Clustering: Experiments and Findings , 2007, BPM.

[15]  Bart Baesens,et al.  Active Trace Clustering for Improved Process Discovery , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Ashish Sureka,et al.  Process Cube for Software Defect Resolution , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[17]  Stephen A. White,et al.  BPMN modeling and reference guide : understanding and using BPMN : develop rigorous yet understandable graphical representations of business processes , 2008 .

[18]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[19]  Antti Latva-Koivisto,et al.  Finding a Complexity Measure for Business Process Models , 2001 .

[20]  Wil M. P. van der Aalst,et al.  Context Aware Trace Clustering: Towards Improving Process Mining Results , 2009, SDM.

[21]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[22]  Christine Halverson,et al.  Designing task visualizations to support the coordination of work in software development , 2006, CSCW '06.

[23]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[24]  Boudewijn F. van Dongen,et al.  Process Mining Framework for Software Processes , 2007, ICSP.

[25]  Philip J. Guo,et al.  Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[26]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[27]  Volker Gruhn,et al.  Complexity Metrics for business Process Models , 2006, BIS.

[28]  Ashish Sureka,et al.  Identifying Software Process Management Challenges: Survey of Practitioners in a Large Global IT Company , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[29]  Wil M. P. van der Aalst,et al.  Business alignment: using process mining as a tool for Delta analysis and conformance testing , 2005, Requirements Engineering.

[30]  Thomas M. Pigoski Practical Software Maintenance: Best Practices for Managing Your Software Investment , 1996 .

[31]  Ken-ichi Matsumoto,et al.  Studying re-opened bugs in open source software , 2012, Empirical Software Engineering.

[32]  Wil M. P. van der Aalst,et al.  Finding Structure in Unstructured Processes: The Case for Process Mining , 2007, Seventh International Conference on Application of Concurrency to System Design (ACSD 2007).

[33]  Jonathan Billington,et al.  Transactions on Petri Nets and Other Models of Concurrency I , 2008, Trans. Petri Nets and Other Models of Concurrency.

[34]  Atul Kumar,et al.  Kashvi: a framework for software process intelligence , 2014, SIGMOD 2014.

[35]  Luigi Pontieri,et al.  Discovering expressive process models by clustering log traces , 2006, IEEE Transactions on Knowledge and Data Engineering.

[36]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[37]  Ashish Sureka,et al.  Process mining software repositories from student projects in an undergraduate software engineering course , 2014, ICSE Companion.

[38]  Diogo R. Ferreira,et al.  Understanding Spaghetti Models with Sequence Clustering for ProM , 2009, Business Process Management Workshops.

[39]  Alexander Serebrenik,et al.  Process Mining Software Repositories , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[40]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[41]  Alain Abran,et al.  Software Maintenance Management: Evaluation and Continuous Improvement (Practitioners) , 2008 .

[42]  Boudewijn F. van Dongen,et al.  The ProM Framework: A New Era in Process Mining Tool Support , 2005, ICATPN.

[43]  Joseph B. Kruskall,et al.  The Symmetric Time-Warping Problem : From Continuous to Discrete , 1983 .