Auto-optimization of a Feature Selection Algorithm

Advanced visualization algorithms are typically computationally expensive but highly data parallel which make them attractive candidates for GPU architectures. However, porting algorithms on a GPU still remains a challenging process. The Mint programming model addresses this issue with its simple and high level interface. It targets the users who seek real-time performance without investing in significant programming effort. In this work, we present automatic CUDA parallelization and optimizations of the Harris interest point detection algorithm with Mint. Mint generates highly optimized CUDA C from annotated C source and performs several optimizations. For 4 well-known datasets in volume rendering, on Tesla C1060 the Mint-generated kernels run under a second and deliver on average 10 times the performance of OpenMP running with 4 threads on a Nehalem processor.