CUDA dynamic parallelism

This chapter introduces CUDA dynamic parallelism, an extension to the CUDA programming model that enables a CUDA kernel to create new thread grids by launching new kernels. Dynamic parallelism allows algorithms that dynamically discover new work to prepare and launch kernels without burdening the host or resorting to complex software techniques. The chapter starts with a simple pattern that benefits from dynamic parallelism. It then presents the essential concepts required in the practical use of dynamic parallelism: memory data visibility, device configuration, memory management, synchronization, streams, and events. It then uses two advanced examples, Bezier Curve calculation and Quad Tree construction to illustrate some of the important subtleties and details that programmers will likely encounter when using dynamic parallelism in real application.