Implemetation of image classification CNN using multi thread GPU

This study implemented an image classification CNN using a multi-thread GPU. For the CNN, the CIFAR10 dataset was used, and the multi-thread GPU had 256 threads. Using the 256 threads limited to each layer, allocation and parallel processing were conducted. The image classification CNN took 807 ms for computation.

[1]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).