Adjoint Algorithmic Differentiation of a GPU Accelerated Application

We consider a GPU accelerated program using Monte Carlo simulation to price a basket call option on 10 FX rates driven by a 10 factor local volatility model. We develop an adjoint version of this program using algorithmic differentiation. The code uses mixed precision. For our test problem of 10,000 sample paths with 360 Euler time steps, we obtain a runtime of 522ms to compute the gradient of the price with respect to the 438 input parameters, the vast majority of which are the market observed implied volatilities (the equivalent single threaded tangent-linear code on a CPU takes 2hrs).