Parallelism in the Reverse Mode

In the basic form of the reverse mode for calculating derivatives, the amount of memory needed to record the intermediate values can become excessively large for problems of practical interest. If sequential checkpointing schemes are used, the memory requirement can be dramatically reduced, but the run time may be signiicantly increased. Implementing suitable checkpointing schemes on multiprocessor systems can decrease the run time to its theoretical minimum. Among the many possible scheduling strategies, we develop one that minimizes resource requirements. We present diierent communication structures that depend on the memory architecture of the multiprocessor system and the available resources. We also estimate the limits of the complexity and the memory requirements of the problem function.