Algorithms for parallel rendering

This dissertation investigates which parallel algorithms and architectures are the best candidates likely to enable real-time rates for "realistic" renderers that support such features as high-level geometric primitives, tesselation for accurate representation of curved surfaces, and anti-aliasing by oversampling. Parallel rendering algorithms are distinguished by where in the graphics pipeline they redistribute data in order to "sort" from object-space to image-space: very early in the pipeline (sort-first), after the natural translation in the pipeline to image-space (sort-middle), and after pixels are calculated (sort-last). The main modeling results give a tradeoff between sort-first, sort-middle, and sort-last. In particular, sort-first "realistic" renderers that support high-level primitives and extensive oversampling are favored by low redistribution costs. However, sort-last "realistic" renderers that support extensive oversampling require very substantial network bandwidth. Modeling shows that sort-middle is intermediate between sort-first and sort-last with respect to costs and benefits. To determine whether these results apply in practice, I have modified RenderMan, a commercial "realistic" rendering package, to support sort-first and sort-last. It is found that sort-first redistribution costs are indeed very low, but that in contrast to previously published results, redundancy is an impediment to scalability. Load imbalance, however, is the more significant cause of inefficiency for moderately-sized machines. For sort-last, redistribution costs are indeed a problem if real-time rates are to be achieved. In addition, the particular renderer studied is extremely vulnerable to synchronization costs when implemented as sort-last; however, this is a limitation of the underlying uniprocessor rendering algorithm rather than a limitation of sort-last. Thus, sort-first is a good candidate for "realistic" rendering except for load imbalance, sort-last except for redistribution costs. Combining theoretical and empirical findings, I propose and implement a hybrid algorithm that employs sort-last to balance load in sort-first. This hybrid is found to be effective at reducing load imbalance. Addressing sort-last redistribution costs, I propose and study two techniques to reduce network bandwidth requirements: the inclusion of local z-buffers, and the use of snooping on a shared bus in order to reduce the number of pixels that must be redistributed. These techniques are investigated by modeling and trace-driven simulations; it is shown that the first is effective at reducing network bandwidth requirements, the second is extremely effective. This dissertation concludes with an identification of further problems that must be solved if "real-time realistic rendering" is to be realized. (Abstract shortened by UMI.)