Parallel optimize technology based on GPU

Standard parallel algorithm cannot work efficiently on GPU.This paper took reduction algorithm for example,introduced four parallel optima methods for NVIDIA's graphics processor unit(GPU) which supported CUDA architecture.These methods included instruction optimize and shared memory conflict avoid and loop unroll and threads overload optimize.The experiment result shows that:parallel optimize can significantly speed up the GPU compute speed.The optimized reduction algorithm is 34 times faster than standard parallel algorithm and 70 times than CPU-based implementation.