论文信息 - A Practical Application of Sharing and Freeness Inference

A Practical Application of Sharing and Freeness Inference

Interpretation [6] of logic programs ([1], [14], [8],[19], [2], [4], [13], [5], [12], [15] ...) is currently proposed as a means for obtaining characteristics of the program at compile-time, tiras allowing several types of optimizations. However, only few studies have been reported analyzing the practicality of analyzers in the task they were designed for [23, 12, 22, 21, 3]. This paper offers a preliminary analysis of effectiveness of an analyzer which contributes to fifi this gap and is novel in both the domain and the application: results are provided for an abstract interpreter based on the sharing + freeness domain presented in [17] and [7] in the application of automatic program parallelization. The analyzer under study was designed to accurately and concisely infer at compile-time variable groundness, sharing, and freeness information for a program and a given query form. The abstract domain approximates this information by combining two components: one provides information on sharing (aliasing, independence) and groundness; the other encodes freeness information. Briefly, the former is essentially the abstract domain of Jacobs and Langen [11] (for efficiency and increased precisión, however, the analyzer under study uses the efficient abstract unification and topdown driven abstract interpretation algorithms defined by Muthukumar and Hermenegildo [18] instead of the puré bottom-up approach used by Jacobs and Langen). The latter is represented as a list of those program variables which are known to be free. Variable sharing is not only required in many types of analysis to ensure correctness, but is also quite useful in a number of applications and, in particular, essential in the compile-time detection of strict independence among goals (see [10] and its references), a condition which allows efficient parallelization of programs within the independent and-parallelism model. Informally, this condition states that a set of goals can run in parallel if they do not share any variable at run-time. Freeness information itself is also useful in a number of applications and essential in the detection of non-strict independence [10] among goals, a condition which extends strict independence. Furthermore, more accurate information is achieved in each of the domains by allowing communication between the two domains at some points of the analysis. Both the accuracy of the information gathered by the interpreter and its effectiveness are evaluated during its use in the actual task of automatic parallelization of logic programs and while the interpreter is embedded in a real parallel logic programming system: &-Profog [9]. These parameters are evaluated in terms of ultimate performance, i.e. the speedup obtained with respect to the sequential versión of the program. 2. Overview of the Evaluation Enviroiinieiit The h-Prolog system comprises a parallelizing compiler aimed at uncovering independent and-parallelism and an execution model/run-time system aimed at exploiting such parallelism. Prolog code is parallelized automatically by the compiler. Compiler switches determine whether or not code will be parallelized and through which type of analysis. The h-Prolog language is a vehicle for expressing and implementing strict and non-strict independent and-parallelism. &-Prolog is essentially Prolog, with the addition of the parallel conjunction operator "&", a set of parallelism-related builtins, which includes several types of groundness and independence checks, and synchronization primitives. For syntactic convenience, an additional construct is also provided: the Conditional Graph Expression (CGE). A CGE has the general form (i-cond => goal\ & goaÍ2 & . . . & goaljy) where the goali are either normal Prolog goals or other CGEs and i-cond is a condition which, if satisfied, guarantees the mutual independence of the goales. The operational meaning of the CGE is "check i-cond; if it succeeds, execute the goali in parallel, otherwise execute them sequentially." There are three different annotators in the &-Prolog system: the CDG, the UDG and the MEL annotator, whose algorithms are defined in [16]. The CDG annotator seeks to maximize the amount of parallelism available in a clause, without being concerned with the size of the resultant &Prolog expression. In doing this, the annotator may switch the positions of independent goals. The UDG annotator does essentially the same as the CDG annotator except that only nnconditional parallelism is exploited, i.e., only goals which can be determined to be independent at compile-time are rnn in parallel. The MEL annotator seeks to maximize the nnmber of goals to be run in parallel within a CGE, preserving the left-to-right order of snbgoals in its expressions. The two abstract interpreters which will be nsed in the evalnation are the sharing + freeness interpreter object of this stndy and the sharing only interpreter of [18]. The &-Prolog system can optionally genérate a trace file dnring an execntion. This file is an encoded description of the events that occnrred dnring the execntion of a parallelized program. Since &-Prolog generates all possible parallel tasks dnring execntion of a parallel program, even if there are only a few processors in the system, all possible parallel program graphs, with their exact execntion times, can be be constrncted from this data. A tool has been devised and implemented which takes as inpnt a real execntion trace file of a parallel program rnn on the &-Prolog system, and gives as a resnlt a new optimized trace file which corresponds to the best possible execntion which wonld have occnrred assnming a system with an infinite nnmber of processors. It also provides statistics abont the speednp obtained and the nnmber of processors needed to achieve it. Since this "ideal" parallel execntion nses as data a real trace execntion file in which real execution times of seqnential segments and all delay times are taken into acconnt, it is possible to consider the resnlts as a very good approximation to the best possible parallel execntion. Two broad categories of programs were nsed for the tests: simple programs and larger ones. Program selection within both categories has been performed taking into acconnt the programs nsed in those stndies with which the resnlts of onr tests are going to be compared.

Manuel V. Hermenegildo | Maria Garcia de la Banda | M. Hermenegildo | M. G. D. L. Banda