Code Instrumentation를 이용한 딥러닝 워크로드 분석

Deep learning are widely used in machine learning systems for achieving high classification accuracy. However, its accuracy comes at the cost of many complex computations due to simultaneous processes of hundreds components in a network layer. Although highly-parallel computing devices such as GPUs focus on yielding high throughput, under-utilization of GPU resources is still a performance bottleneck. In this paper, we analyze the LeNet workload, one of the popular deep learning algorithm, based on an instrumentation tool. We find the bottleneck of the LeNet when running on GPU, and propose a method to further accelerate the LeNet algorithm.