Debugging program failure exhibited by voluminous data

It is difficult to debug a program when the data set that causes it to fail is large (or voluminous). The cues that may help in locating the fault are obscured by the large amount of information that is generated from processing the data set. Clearly, a smaller data set which exhibits the same failure should lead to the diagnosis of the fault more quickly than the initial, large data set. We term such a smaller data set a data slice and the process of creating it data slicing. The problem of creating a data slice is undecidable. In this paper, we investigate four generateand-test heuristics for deriving a smaller data set that reproduces the failure exhibited by a large data set. The four heuristics are: invariance analysis, origin tracking, random elimination, and programspecific heuristics. We also provide a classification of programs based upon a certain relationship between its input and output. This classification may be used to choose an appropriate heuristic in a given debugging scenario. As evidenced from a database of debugging anecdotes at the Open University, U.K., debugging failures exhibited by large data sets require inordinate amounts of time. Our data slicing techniques would significantly reduce the effort required in such scenarios.

[1]  Thomas W. Reps,et al.  Integrating non-intering versions of programs , 1988, POPL '88.

[2]  David W. Binkley,et al.  Program slicing , 2008, 2008 Frontiers of Software Maintenance.

[3]  Walter A. Burkhard C for Programmers , 1988 .

[4]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[5]  Carroll Morgan,et al.  The specification statement , 1988, TOPL.

[6]  Tarak Shantilal Goradia,et al.  Dynamic Impact Analysis: Analyzing Error Propagation in Program Executions , 1993 .

[7]  T. B. Dinesh,et al.  Animators and error reporters for generated programming environments , 1992 .

[8]  M. Eisenstadt Tales of Debugging from The Front Lines , 1993 .

[9]  Victor R. Basili,et al.  System Structure Analysis: Clustering with Data Bindings , 1985, IEEE Transactions on Software Engineering.

[10]  Lee J. White Software Testing and Verification , 1987, Adv. Comput..

[11]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[12]  Joseph Robert Horgan,et al.  Dynamic program slicing , 1990, PLDI '90.

[13]  Arie van Deursen,et al.  Origin Tracking , 1993, J. Symb. Comput..

[14]  Eugene H. Spafford,et al.  An execution-backtracking approach to debugging , 1991, IEEE Software.

[15]  K. A. Ericsson,et al.  Protocol Analysis: Verbal Reports as Data , 1984 .

[16]  Rudolph E. Seviora Knowledge-Based Program Debugging Systems Noonelikes todebugprograms, and there isnowaytoautomate thetask. However, knowledge-based approaches offer some ;possibilities forthefuture. , 1987 .

[17]  Ben Shneiderman,et al.  Empirical studies of programmers: the territory, paths, and destination , 1986 .