Drive-by Analysis of Running Programs

Understanding the behavior of complex Java programs requires a balance of detail and volume. For example, a transaction server may perform poorly because it is not always precompiling a database query. Establishing this piece of information requires a relatively fine level of detail: e.g., the sequence, context, and duration of individual method invocations. To collect this detail, we could generate a detailed trace of the program. Unfortunately, recording detailed execution traces quickly becomes infeasible, due to both time perturbations and space overheads — infeasible for even reasonably complex programs. As a quick example, tracing IBM’s Jinsight visualizer from the outset consumes 37MB by the time the main window has appeared; this figure does not even include the tracing of any values, such as argument or return values. For a highly multithreaded server, such as IBM’s WebSphere Application Server, traces of similar time frames can easily grow to ten times this size. Yet, the other extreme, tracing only aggregate statistics (such as heap consumption, method invocation counts, each methods’ average invocation time, or an aggregate call graph [1, 2, 3]) often will not discover the root problem: why are these transactions slow, while those others are fast? Why does the program fail on some calls to one method, but not other calls to that same method? We propose that it is not just any details that will help, but rather details associated with a particular task. In our transaction server example, at any point in time the server will be processing transactions in many threads, and doing administrative work in others; Figure 1 shows such a case. Many types of transaction may be in progress at the same time, and each transaction will most likely be in a different stage of its work. So just gathering details for the entire application for a period of time, or doing a broad filtering to include only certain classes, will still include too much detail unrelated to the problem. In our example, all we would like to see is the database activity associated with a specific transaction. In Section 2 we introduce the concept of a burst to represent the details associated with a task. To analyze a program using bursts requires a methodology for choosing the criteria that define a burst. We rely on the tool user to establish these criteria. Moreover, the level of detail and the tasks of interest may vary as the user validates or disproves each hypothesis about where a problem may lie. For example: at first, the user may not even know the names of relevant routines; at this point, the user is not interested in seeing the second argument value of the fifth invocation of some method. Eventually, though, the user may need to know such fine details. Yet, by this time, the tool user may know that it is only that second argument of the fifth invocation that is interesting (and not any other argument of other methods). Thus, the tool user iteratively explores tasks. To accomplish this interactive exploration, our solution exploits the repetitive nature common in, for example, the increasingly important class of server applications. In this paper, we outline the mechanisms and associated methodology for this style of drive-by analysis: the iterative process of user-assisted gathering of dynamic information from running programs. ∗Currently at University of British Columbia, mrobilla@cs.ubc.ca.