Learning Tractable Probabilistic Models for Fault Localization

In recent years, several probabilistic techniques have been applied to various debugging problems. However, most existing probabilistic debugging systems use relatively simple statistical models, and fail to generalize across multiple programs. In this work, we propose Tractable Fault Localization Models (TFLMs) that can be learned from data, and probabilistically infer the location of the bug. While most previous statistical debugging methods generalize over many executions of a single program, TFLMs are trained on a corpus of previously seen buggy programs, and learn to identify recurring patterns of bugs. Widely-used fault localization techniques such as TARANTULA evaluate the suspiciousness of each line in isolation; in contrast, a TFLM defines a joint probability distribution over buggy indicator variables for each line. Joint distributions with rich dependency structure are often computationally intractable; TFLMs avoid this by exploiting recent developments in tractable probabilistic models (specifically, Relational SPNs). Further, TFLMs can incorporate additional sources of information, including coverage-based features such as TARANTULA. We evaluate the fault localization performance of TFLMs that include TARANTULA scores as features in the probabilistic model. Our study shows that the learned TFLMs isolate bugs more effectively than previous statistical methods or using TARANTULA directly.

[1]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[2]  Gregory Tassey,et al.  Prepared for what , 2007 .

[3]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[4]  H. Cleve,et al.  Locating causes of program failures , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[5]  Bhavani M. Thuraisingham,et al.  Effective Software Fault Localization Using an RBF Neural Network , 2012, IEEE Transactions on Reliability.

[6]  T. H. Tse,et al.  Non-parametric statistical fault localization , 2011, J. Syst. Softw..

[7]  Lei Zhao,et al.  A Crosstab-based Statistical Method for Effective Fault Localization , 2008, 2008 1st International Conference on Software Testing, Verification, and Validation.

[8]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[9]  Dianxiang Xu,et al.  Towards Better Fault Localization: A Crosstab-Based Statistical Approach , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Dan Ventura,et al.  Learning the Architecture of Sum-Product Networks Using Clustering on Variables , 2012, NIPS.

[11]  Yuriy Brun,et al.  Finding latent code errors via machine learning over program executions , 2004, Proceedings. 26th International Conference on Software Engineering.

[12]  Pedro M. Domingos,et al.  Discriminative Learning of Sum-Product Networks , 2012, NIPS.

[13]  Detlef Prescher,et al.  Inducing Head-Driven PCFGs with Latent Heads: Refining a Tree-Bank Grammar for Parsing , 2005, ECML.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  Peter Zoeteweij,et al.  A practical evaluation of spectrum-based fault localization , 2009, J. Syst. Softw..

[16]  Yu Qi,et al.  Bp Neural Network-Based Effective Fault Localization , 2009, Int. J. Softw. Eng. Knowl. Eng..

[17]  Pedro M. Domingos,et al.  Learning Relational Sum-Product Networks , 2015, AAAI.

[18]  Mary Jean Harrold,et al.  An empirical study of the effects of test-suite reduction on fault localization , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[19]  Pedro M. Domingos,et al.  Learning the Structure of Sum-Product Networks , 2013, ICML.

[20]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI '03.

[21]  Silvia R. Vergilio,et al.  Exploring machine learning techniques for fault localization , 2009, 2009 10th Latin American Test Workshop.

[22]  Xiaojin Zhu,et al.  Statistical Debugging Using Latent Topic Models , 2007, ECML.

[23]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[24]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[25]  Mohamed R. Amer,et al.  Sum-product networks for modeling activities with stochastic structure , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Chao Liu,et al.  SOBER: statistical model-based bug localization , 2005, ESEC/FSE-13.

[27]  Lionel C. Briand,et al.  Using Machine Learning to Support Debugging with Tarantula , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[28]  Pedro M. Domingos,et al.  A Tractable First-Order Probabilistic Logic , 2012, AAAI.

[29]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[30]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[31]  Brent Hailpern,et al.  Software debugging, testing, and verification , 2002, IBM Syst. J..

[32]  Chao Liu,et al.  Statistical Debugging: A Hypothesis Testing-Based Approach , 2006, IEEE Transactions on Software Engineering.

[33]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[34]  Franz Pernkopf,et al.  Greedy Part-Wise Learning of Sum-Product Networks , 2013, ECML/PKDD.

[36]  Trishul M. Chilimbi,et al.  HOLMES: Effective statistical debugging via efficient path profiling , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[37]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[38]  Sudheendra Hangal,et al.  Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.

[39]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.