In recent years, with the rapid development of machine learning and artificial intelligence, the problem of target recognition and classification has made a breakthrough. Single mode data cannot summarize the feature information of the target well, multi-view and multi-modal representation learning become a fast-growing direction in the field of machine learning and data mining. In the case that the single modal data sample is not enough, the multi-modal analysis method can not only make up for the difficult training problem caused by insufficient data, but also enhance the representation ability by extracting the complementary relationship between different view of data. Multi-model can improve the accuracy of target recognition. Therefore, this paper proposes an image feature extraction based on convolutional neural network (CNN) and a feature fusion method in the feature layer to better solve the classification problem. This method mainly consists of three steps. First, a convolutional neural network is trained and used to extract features of the images. Secondly, selecting an appropriate dimension in the extracted image features and fusing it with the basic attribute features of the target. Finally, the newly formed fusion features are used to train the classifier and perform comparative verification on the data set. Through experiments, it is finally found that multi-feature fusion can better represent the target, achieving better results on the target classification and recognition problem.
[1]
Ming Yang,et al.
A Survey of Multi-View Representation Learning
,
2019,
IEEE Transactions on Knowledge and Data Engineering.
[2]
Michael Jones,et al.
An improved deep learning architecture for person re-identification
,
2015,
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3]
Léon Bottou,et al.
Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics
,
2014,
EMNLP.
[4]
Geoffrey E. Hinton,et al.
ImageNet classification with deep convolutional neural networks
,
2012,
Commun. ACM.
[5]
Colin Fyfe,et al.
Kernel and Nonlinear Canonical Correlation Analysis
,
2000,
IJCNN.
[6]
H. Hotelling.
Relations Between Two Sets of Variates
,
1936
.
[7]
Ishwar K. Sethi,et al.
Multimedia content processing through cross-modal association
,
2003,
MULTIMEDIA '03.
[8]
Kevin Gimpel,et al.
Deep Multilingual Correlation for Improved Word Embeddings
,
2015,
NAACL.