Sampling User Executions for Bug Isolation

Many computer scientists think of a program as either correct (i.e. it meets some specification) or incorrect (i.e. it does not meet some specification). But industrial software development is as much about economics as computer science. Software quality is a monetary balancing act among engineers’ salaries, time to market, user expectations, and other business concerns. We ship software when it seems correct enough to neither embarrass us nor alienate users. We ship software with known bugs that are not worth fixing, and users uncover new bugs that we never imagined. Practitioners clearly need something other than a Boolean notion of correctness, but such a notion has been difficult to quantify. In-house testing can only guess at field usage patterns, and poor guesses can leave users in bad shape. An obscure, low-priority bug that was difficult to reproduce in the testing lab may turn out to affect large numbers of users on a regular basis. Technical support channels provide one way for post-deployment feedback to reach engineers, but traditionally these mechanisms have been informal and overly dependent on human intervention. Widespread Internet connectivity makes possible a radical change to this situation. For the first time it is feasible to directly observe the reality of a software system’s deployment. Through sheer numbers, the user community brings far more resources to bear on exercising a piece of software than could possibly be provided by the software’s authors. Coupled with an instrumentation and reporting infrastructure, these users can potentially replace guesswork with real triage, directing scarce engineering resources to those areas that benefit the most people.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  A. Aiken,et al.  Distributed Program Sampling , 2002 .

[3]  Matthew Arnold,et al.  A framework for reducing the cost of instrumented code , 2001, PLDI '01.

[4]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI '03.

[5]  Andrew C. Myers,et al.  Untrusted hosts and confidentiality , 2001, SOSP.

[6]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).