Object detection using hybridization of static and dynamic feature spaces and its exploitation by ensemble classification

This paper presents a learning mechanism based on hybridization of static and dynamic learning. Realizing the detection performances offered by the state-of-the-art deep learning techniques and the competitive performances offered by the conventional static learning techniques, we propose the idea of exploitation of the concatenated (parallel) hybridization of the static and dynamic learning-based feature spaces. This is contrary to the cascaded (series) hybridization topology in which the initial feature space (provided by the conventional, static, and handcrafted feature extraction technique) is explored using deep, dynamic, and automated learning technique. Consequently, the characteristics already suppressed by the conventional representation cannot be explored by the dynamic learning technique. Instead, the proposed technique combines the conventional static and deep dynamic representation in concatenated (parallel) topology to generate an information-rich hybrid feature space. Thus, this hybrid feature space may aggregate the good characteristics of both conventional and deep representations, which are then explored using an appropriate classification technique. We also hypothesize that ensemble classification may better exploit this parallel hybrid perspective of the feature spaces. For this purpose, pyramid histogram of oriented gradients-based static learning has been incorporated in conjunction with convolution neural network-based deep learning to produce concatenated hybrid feature space. This hybrid space is then explored with various state-of-the-art ensemble classification techniques. We have considered the publicly available INRIA person and Caltech pedestrian standard image datasets to assess the performance of the proposed hybrid learning system. Furthermore, McNemar’s test has been used to statistically validate the outperformance of the proposed technique over various contemporary techniques. The validated experimental results show that the employment of the proposed hybrid representation results in effective detection performance (an AUC of 0.9996 for INRIA person and 0.9985 for Caltech pedestrian datasets) as compared to the individual static and dynamic representations.

[1]  Donald W. Schaffner,et al.  Comparison of Logistic Regression and Linear Regression in Modeling Percentage Data , 2001, Applied and Environmental Microbiology.

[2]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[3]  Saeid Nahavandi,et al.  Human action recognition based on Pyramid Histogram of Oriented Gradients , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[4]  Ahmad Taher Azar,et al.  Fast neural network learning algorithms for medical applications , 2012, Neural Computing and Applications.

[5]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Reza Ahsan,et al.  Human Detection Using Surf And Sift Feature Extraction Methods In Different Color Spaces , 2014 .

[9]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[10]  P. Wintz Transform picture coding , 1972 .

[11]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[12]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[13]  Yihong Gong,et al.  Human Tracking Using Convolutional Neural Networks , 2010, IEEE Transactions on Neural Networks.

[14]  Brian Peacock,et al.  Chi‐Squared Distribution , 2010 .

[15]  Beat Fasel,et al.  Robust face analysis using convolutional neural networks , 2002, Object recognition supported by user interaction for service robots.

[16]  Kenneth Revett,et al.  Evaluation of unsupervised feature extraction neural networks for face recognition , 2012, Neural Computing and Applications.

[17]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[18]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Eibe Frank,et al.  Speeding Up Logistic Model Tree Induction , 2005, PKDD.

[21]  Luca Maria Gambardella,et al.  Convolutional Neural Network Committees for Handwritten Character Classification , 2011, 2011 International Conference on Document Analysis and Recognition.

[22]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Navneet Dalal,et al.  Finding People in Images and Videos , 2006 .

[24]  Hong Han,et al.  Human Detection Based on Optical Flow and Spare Geometric Flow , 2013, 2013 Seventh International Conference on Image and Graphics.

[25]  Eibe Frank,et al.  Logistic Model Trees , 2003, ECML.

[26]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[27]  Pietro Perona,et al.  Pedestrian detection: A benchmark , 2009, CVPR.

[28]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[29]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[30]  Palmer Encyclopedia of biostatistics , 1999, BMJ.

[31]  H. O. Lancaster,et al.  Chi-Square Distribution , 2005 .

[32]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[33]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[34]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[36]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[37]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[38]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[39]  Walid Mahdi,et al.  Deep multilayer multiple kernel learning , 2016, Neural Computing and Applications.

[40]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[41]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[42]  Martin D. Levine,et al.  Feature extraction: A survey , 1969 .

[43]  Li Deng,et al.  A tutorial survey of architectures, algorithms, and applications for deep learning , 2014, APSIPA Transactions on Signal and Information Processing.

[44]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Patrice Y. Simard,et al.  High Performance Convolutional Neural Networks for Document Processing , 2006 .

[46]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[47]  Satoshi Ito,et al.  Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection , 2009, PSIVT.

[48]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[49]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[50]  Luca Maria Gambardella,et al.  Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks , 2013, MICCAI.

[51]  M. Szarvas,et al.  Pedestrian detection with convolutional neural networks , 2005, IEEE Proceedings. Intelligent Vehicles Symposium, 2005..

[52]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[53]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Kongqiao Wang,et al.  Robust CoHOG Feature Extraction in Human-Centered Image/Video Management System , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[55]  Amitabh Varshney,et al.  Kinetic depth images: flexible generation of depth perception , 2017, The Visual Computer.

[56]  Muhammad Arif,et al.  Cortex-inspired multilayer hierarchy based object detection system using PHOG descriptors and ensemble classification , 2015, The Visual Computer.

[57]  Mingui Sun,et al.  Multiview stereo and silhouette fusion via minimizing generalized reprojection error , 2015, Image Vis. Comput..

[58]  A. Jain,et al.  A Fast Karhunen-Loeve Transform for a Class of Random Processes , 1976, IEEE Trans. Commun..

[59]  Joshua Powell Pedestrian Detection with Convolutional Neural Networks , 2017 .

[60]  Marcel Abendroth,et al.  Data Mining Practical Machine Learning Tools And Techniques With Java Implementations , 2016 .

[61]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.