论文信息 - Deep Feature Space: A Geometrical Perspective

Deep Feature Space: A Geometrical Perspective

One of the most prominent attributes of Neural Networks (NNs) constitutes their capability of learning to extract robust and descriptive features from high dimensional data, like images. Hence, such an ability renders their exploitation as feature extractors particularly frequent in an abundant of modern reasoning systems. Their application scope mainly includes complex cascade tasks, like multi-modal recognition and deep Reinforcement Learning (RL). However, NNs induce implicit biases that are difficult to avoid or to deal with and are not met in traditional image descriptors. Moreover, the lack of knowledge for describing the intra-layer properties -and thus their general behavior- restricts the further applicability of the extracted features. With the paper at hand, a novel way of visualizing and understanding the vector space before the NNs output layer is presented, aiming to enlighten the deep feature vectors properties under classification tasks. Main attention is paid to the nature of overfitting in the feature space and its adverse effect on further exploitation. We present the findings that can be derived from our models formulation, and we evaluate them on realistic recognition scenarios, proving its prominence by improving the obtained results.

[1] Yu Qiao,et al. A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[2] Tao Wang,et al. Semi-Supervised Discriminative Classification Robust to Sample-Outliers and Feature-Noises , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Xiaogang Wang,et al. Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[4] Antonios Gasteratos,et al. On-line deep learning method for action recognition , 2014, Pattern Analysis and Applications.

[5] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7] Erik Cambria,et al. Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[8] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9] Ping Lu,et al. Audio-visual emotion fusion (AVEF): A deep efficient weighted approach , 2019, Inf. Fusion.

[10] Svetlana Lazebnik,et al. Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11] Björn W. Schuller,et al. The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[12] Giorgio Vassallo,et al. A brief introduction to Clifford algebra , 2010 .

[13] Vaibhava Goel,et al. Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[15] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Bhiksha Raj,et al. SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[19] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[20] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[21] Carlos D. Castillo,et al. An Experimental Evaluation of Covariates Effects on Unconstrained Face Verification , 2018, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[22] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[24] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[26] Jaap Suter,et al. Geometric Algebra Primer , 2003 .

[27] Leonidas J. Guibas,et al. A scalable active framework for region annotation in 3D shape collections , 2016, ACM Trans. Graph..

[28] Tara N. Sainath,et al. Deep Learning for Audio Signal Processing , 2019, IEEE Journal of Selected Topics in Signal Processing.

[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30] Antonios Gasteratos,et al. An Active Learning Paradigm for Online Audio-Visual Emotion Recognition , 2019, IEEE Transactions on Affective Computing.

[31] Frank Hutter,et al. A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.

[32] Zenghui Wang,et al. Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review , 2017, Neural Computation.

[33] Ling Guan,et al. Recognizing Human Emotional State From Audiovisual Signals , 2008, IEEE Transactions on Multimedia.

[34] F. Xavier Roca,et al. Regularizing CNNs with Locally Constrained Decorrelations , 2016, ICLR.

[35] Le Song,et al. Decoupled Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] Le Song,et al. Deep Hyperspherical Learning , 2017, NIPS.

[37] Mohammad H. Mahoor,et al. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[38] Cigdem Eroglu Erdem,et al. BAUM-1: A Spontaneous Audio-Visual Face Database of Affective and Mental States , 2017, IEEE Transactions on Affective Computing.

[39] Jian Cheng,et al. Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.

[40] George Trigeorgis,et al. End-to-End Multimodal Emotion Recognition Using Deep Neural Networks , 2017, IEEE Journal of Selected Topics in Signal Processing.

[41] Rama Chellappa,et al. HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[43] Xing Ji,et al. CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44] Yuanliu Liu,et al. Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[45] Le Song,et al. Learning towards Minimum Hyperspherical Energy , 2018, NeurIPS.

[46] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47] Le Song,et al. Regularizing Neural Networks via Minimizing Hyperspherical Energy , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49] Meng Yang,et al. Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[51] Wenwu Zhu,et al. Deep Learning on Graphs: A Survey , 2018, IEEE Transactions on Knowledge and Data Engineering.

[52] Carlos D. Castillo,et al. L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[53] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[55] Wen Gao,et al. Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[56] Antonios Gasteratos,et al. Unsupervised Human Detection with an Embedded Vision System on a Fully Autonomous UAV for Search and Rescue Operations , 2019, Sensors.