A data-driven study of image feature extraction and fusion

Feature analysis is the extraction and comparison of signals from multimedia data, which can subsequently be semantically analyzed. Feature analysis is the foundation of many multimedia computing tasks such as object recognition, image annotation, and multimedia information retrieval. In recent decades, considerable work has been devoted to the research of feature analysis. In this work, we use large-scale datasets to conduct a comparative study of four state-of-the-art, representative feature extraction algorithms: color-texture codebook (CT), SIFT codebook, HMAX, and convolutional networks (ConvNet). Our comparative evaluation demonstrates that different feature extraction algorithms enjoy their own advantages, and excel in different image categories. We provide key observations to explain where these algorithms excel and why. Based on these observations, we recommend feature extraction principles and identify several pitfalls for researchers and practitioners to avoid. Furthermore, we determine that in a large training dataset with more than 10,000 instances per image category, the four evaluated algorithms can converge to the same high level of category-prediction accuracy. This result supports the effectiveness of the data-driven approach. Finally, based on learned clues from each algorithm's confusion matrix, we devise a fusion algorithm to harvest synergies between these four algorithms and further improve class-prediction accuracy.

[1]  Xiao Zhang,et al.  Finding Celebrities in Billions of Web Images , 2012, IEEE Transactions on Multimedia.

[2]  Lifeng Sun,et al.  A Matrix-Based Approach to Unsupervised Human Action Categorization , 2012, IEEE Transactions on Multimedia.

[3]  Naif Alajlan,et al.  Fusion of supervised and unsupervised learning for improved classification of hyperspectral images , 2012, Inf. Sci..

[4]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ricardo da Silva Torres,et al.  Exploiting pairwise recommendation and clustering strategies for image re-ranking , 2012, Inf. Sci..

[6]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[7]  Joost van de Weijer,et al.  Boosting color saliency in image feature detection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[9]  Saurabh Prasad,et al.  Decision Fusion With Confidence-Based Weight Assignment for Hyperspectral Target Recognition , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[10]  Lifeng Sun,et al.  Hierarchical visual event pattern mining and its applications , 2010, Data Mining and Knowledge Discovery.

[11]  Edward Y. Chang,et al.  Foundations of Large-Scale Multimedia Information Management and Retrieval: Mathematics of Perception , 2011 .

[12]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  T. Gevers,et al.  UvA-DARE ( Digital Academic Repository ) Robust Histogram Construction from Color Invariants for Object Recognition , 2003 .

[14]  Arnold W. M. Smeulders,et al.  Color Invariance , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Lifeng Sun,et al.  A Sequential Monte Carlo Approach to Anomaly Detection in Tracking Visual Events , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  H. Komatsu,et al.  Color Selectivity of Neurons in the Posterior Inferior Temporal Cortex of the Macaque Monkey , 2009, Cerebral cortex.

[17]  Yann LeCun,et al.  Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[18]  C.-C. Jay Kuo,et al.  Texture analysis and classification with tree-structured wavelet transform , 1993, IEEE Trans. Image Process..

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[21]  Tomaso Poggio,et al.  Intracellular measurements of spatial integration and the MAX operation in complex cells of the cat primary visual cortex. , 2004, Journal of neurophysiology.

[22]  Tomaso Poggio,et al.  Learning a dictionary of shape-components in visual cortex: comparison with neurons, humans and machines , 2006 .

[23]  A. Bachelor GLOSSARY OF TERMS GLOSSARY OF TERMS , 2010 .

[24]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Hao Wang,et al.  PSVM : Parallelizing Support Vector Machines on Distributed Computers , 2007 .

[26]  T. Poggio,et al.  Are Cortical Models Really Bound by the “Binding Problem”? , 1999, Neuron.

[27]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[28]  Yoram Reich,et al.  Strengthening learning algorithms by feature discovery , 2012, Inf. Sci..

[29]  Congfu Xu,et al.  Fusion of Text and Image Features: A New Approach to Image Spam Filtering , 2011 .

[30]  E. Miller,et al.  The prefontral cortex and cognitive control , 2000, Nature Reviews Neuroscience.

[31]  János Abonyi,et al.  Learning fuzzy classification rules from labeled data , 2003, Inf. Sci..

[32]  Abby Goodrum,et al.  Image Information Retrieval: An Overview of Current Research , 2000, Informing Sci. Int. J. an Emerg. Transdiscipl..

[33]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[36]  Jia-Guu Leu Computing a shape's moments from its boundary , 1991, Pattern Recognit..

[37]  Cordelia Schmid,et al.  Packing bag-of-features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[38]  Qi Tian,et al.  Social-oriented visual image search , 2014, Comput. Vis. Image Underst..

[39]  A. Oliva,et al.  Coarse Blobs or Fine Edges? Evidence That Information Diagnosticity Changes the Perception of Complex Visual Stimuli , 1997, Cognitive Psychology.

[40]  Hermann Ney,et al.  Features for image retrieval: an experimental comparison , 2008, Information Retrieval.

[41]  Changhu Wang,et al.  Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  T. Gawne,et al.  Responses of primate visual cortical V4 neurons to simultaneously presented stimuli. , 2002, Journal of neurophysiology.

[43]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[44]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[45]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[47]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[48]  Jian Yang,et al.  Feature fusion: parallel strategy vs. serial strategy , 2003, Pattern Recognit..

[49]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[50]  Arnold W. M. Smeulders,et al.  Color-based object recognition , 1997, Pattern Recognit..

[51]  Meng Wang,et al.  Multimodal Graph-Based Reranking for Web Image Search , 2012, IEEE Transactions on Image Processing.

[52]  E. Y. Chang,et al.  Toward perception-based image retrieval , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[53]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[54]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[55]  Tao Mei,et al.  Graph-based semi-supervised learning with multiple labels , 2009, J. Vis. Commun. Image Represent..

[56]  Ronald R. Yager,et al.  A framework for multi-source data fusion , 2004, Inf. Sci..

[57]  Bedrich J. Hosticka,et al.  A comparison of texture feature extraction using adaptive gabor filtering, pyramidal and tree structured wavelet transforms , 1996, Pattern Recognit..

[58]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Harriet J. Nock,et al.  Discriminative model fusion for semantic concept detection and annotation in video , 2003, ACM Multimedia.

[60]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[62]  Yi Yang,et al.  Interactive Video Indexing With Statistical Active Learning , 2012, IEEE Transactions on Multimedia.

[63]  E. Miller,et al.  THE PREFRONTAL CORTEX AND COGNITIVE CONTROL , 2000 .

[64]  Yan Liu,et al.  A new method of feature fusion and its application in image recognition , 2005, Pattern Recognit..