Optimizing for Reacting Navier-Stokes Equations

The optimizations discussed in this chapter significantly improved concurrency on both Intel Xeon Phi coprocessors and Intel Xeon processors. OpenMP scaling of 240 threads vs. one thread is now 100x, was 38x in first version for coprocessors. Similarly, processor scaling improved to 16x from 10x. The chapter discusses source modifications to transform fine-grain thread parallel approach to be more coarse-grain, memory allocation considerations on Intel Xeon Phi coprocessors, and source transformations to improve vectorization. In addition, this chapter briefly demonstrates how new features in VTune Amplifier XE can be used for OpenMP analysis.