Efficient Hardware for Tile-Based Rasterization

An efficient logic-enhanced memory architecture is presented that solves existing problems associated with 3D graphics tile-based hardware rasterization algorithms. The memory contains the same number of bits as the number of pixels in the tile, and during rasterization time it is filled up in several clock cycles by a systolic primitive scanconversion subsystem with the stencil of the primitive: ones are written for memory locations that represent tile pixels covered by the primitive, otherwise zeros are stored. Once the shape of the primitive has been coded inside the memory, the memory internal logic is capable of delivering, on request, up to four hit positions (tile positions inside the primitive) per clock cycle to the pixel processing pipelines, signaling when all the hit positions were consumed. Employing our proposed memory architecture no searching overhead is needed to find the first hit position inside the primitives. Furthermore “ghost” primitives are handled efficiently with a small constant delay irrespective of the primitive size. Finally, hit positions (communicated in a spatial pattern to increase texture cache hit ratios) can always be mapped to different memory banks in the Z-buffer or colorbuffer breaking the “read-modify-write” dependency associated with depth test and color blending. Hardware implementation in a commercial 0.18μm process technology for a QVGA 3D graphics hardware accelerator with a tile size of 32× 16 pixels has indicated that the memory can be clocked at 200MHz and consumes an area of 120000μm. Keywords— 3D graphics architectures; tile-based rasterization; embedded systems; memory architectures.