Abstract This paper presents an empirical analysis of theperformance of popular convolutional neural networks (CNNs) for identifying objects in real time video feeds. The most popular convolution neural networks for object detection and object category classification from images are Alex Nets, GoogLeNet, and ResNet50. A variety of image data sets are available to test the performance of different types of CNN’s. The commonly found benchmark datasets for evaluating the performance of a convolutional neural network are anImageNet dataset, and CIFAR10, CIFAR100, and MNIST image data sets. This study focuses on analyzing the performance of three popular networks: Alex Net, GoogLeNet, and ResNet50. We have taken three most popular data sets ImageNet, CIFAR10, and CIFAR100 for our study, since, testing the performance of a network on a single data set does not reveal its true capability and limitations. It must be noted that videos are not used as a training dataset, they are used as testing datasets. Our analysis shows that GoogLeNet and ResNet50 are able to recognize objects with better precision compared to Alex Net. Moreover, theperformance of trained CNN’s vary substantially across different categories of objects and we, therefore, will discuss the possible reasons for this.
[1]
Nitish Srivastava,et al.
Dropout: a simple way to prevent neural networks from overfitting
,
2014,
J. Mach. Learn. Res..
[2]
Marc Alexa,et al.
How do humans sketch objects?
,
2012,
ACM Trans. Graph..
[3]
Ah Chung Tsoi,et al.
Face recognition: a convolutional neural-network approach
,
1997,
IEEE Trans. Neural Networks.
[4]
G LoweDavid,et al.
Distinctive Image Features from Scale-Invariant Keypoints
,
2004
.
[5]
Goran Kvascev,et al.
Hand gesture recognition using neural network based techniques
,
2016,
2016 13th Symposium on Neural Networks and Applications (NEUREL).