Performance analysis of parallel processing systems

A centralized parallel processing system with job splitting is considered. In such a system, jobs wait in a central queue, which is accessible by all the processors, and are split into independent tasks that can be executed on separate processors. This parallel processing system is modeled as a bulk arrival MX/M/c queueing system where customers and bulks correspond to tasks and jobs, respectively. Such a system has been studied in [1, 3] and an expression for the mean response time of a random customer is obtained. However, since we are interested in the time that a job spends in the system, including synchronization delay, we must evaluate the bulk response time rather than simply the customer response time. The job response time is the sum of the job waiting time and the job service time. By analyzing the bulk queueing system we obtain an expression for the mean job waiting time. The mean job service time is given by a set of recurrence equations. To compare this system with other parallel processing systems, the following four models are considered: Distributed/Splitting (D/S), Distributed/No Splitting (D/NS), Centralized/Splitting (C/S), and Centralized/No Splitting (C/NS). In each of these systems there are c processors, jobs are assumed to consist of set of tasks that are independent and have exponentially distributed service requirements, and arrivals of jobs are assumed to come from a Poisson point source. The systems differ in the way jobs queue for the processors and in the way jobs are scheduled on the processors. The queueing of jobs for processors is distributed if each processor has its own queue, and is centralized if there is a common queue for all the processors. The scheduling of jobs on the processors is no splitting if the entire set of tasks composing that job are scheduled to run sequentially on the same processor once the job is scheduled. On the other hand, the scheduling is splitting if the tasks of a job are scheduled so that they can be run independently and potentially in parallel on different processors. In the splitting case a job is completed only when all of its tasks have finished execution. In our study we compare the mean response time of jobs in each of the systems for differing values of the number of processors, number of tasks per job, server utilization, and certain overheads associated with splitting up a job. The MX/M/c system studied in the first part of the paper corresponds to the C/S system. In this system, as processors become free they serve the first task in the queue. D/. systems are studied in [2]. We use the approximate analysis of the D/S system and the exact analysis of the D/NS system that are given in that paper. For systems with 32 processors or less, the relative error in the approximation for the D/S system was found to be less than 5 percent. In the D/NS system, jobs are assigned to processors with equal probabilities. The approximation we use for the mean job response time for the C/NS system is found in [4]. Although an extensive error analysis for this system over all parameter ranges has not been carried out, the largest relative error for the M/E2/10 system reported in [4] is about 0.1 percent. For all values of utilization, &rgr;, our results show that the splitting systems yield lower mean job response time than the no splitting systems. This follows from the fact that, in the splitting case, work is distributed over all the processors. For any &rgr;, the lowest (highest) mean job response time is achieved by the C/S system (the D/NS system). The relative performance of the D/S system and the C/NS system depends on the value of &rgr;. For small &rgr;, the parallelism achieved by splitting jobs into parallel tasks in the D/S system reduces its mean job response time as compared to the C/NS system, where tasks of the same job are executed sequentially. However, for high &rgr;, the C/NS system has lower mean job response time than the D/S system. This is due to the long synchronization delay incurred in the D/S system at high utilizations. The effect of parallelism on the performance of parallel processing systems is studied by comparing the performance of the C/NS system to that of the C/S system. The performance improvement obtained by splitting jobs into tasks is found to decrease with increasing utilization. For a fixed number of processors and fixed &rgr;, we find that by increasing the number of tasks per job, i.e. higher parallelism, the mean job response time of the C/NS system relative to that of the C/S system increases. By considering an overhead delay associated with splitting jobs into independent tasks, we observe that the mean job response time is a convex function of the number of tasks, and thus, for a given arrival rate, there exists a unique optimum number of tasks per job. We also consider problems associated with partitioning the processors into two sets, each dedicated to one of two classes of jobs: edit jobs and batch jobs. Edit jobs are assumed to consist of simple operations that have no inherent parallelism and thus consist of only one task. Batch jobs, on the other hand, are assumed to be inherently parallel and can be broken up into tasks. All tasks from either class are assumed to have the same service requirements. A number of interesting phenomena are observed. For example, when half the jobs are edit jobs, the mean job response time for both classes of jobs increases if one processor is allocated to edit jobs. Improvement to edit jobs, at a cost of increasing the mean job response time of batch jobs, results only when the number of processors allocated to edit jobs is increased to two. This, and other results, suggest that it is desirable for parallel processing systems to have a controllable boundary for processor partitioning.