论文信息 - Estimating (and fixing) the Effect of Face Obfuscation in Video Recognition

Estimating (and fixing) the Effect of Face Obfuscation in Video Recognition

Recent research has shown that faces can be obfuscated in large-scale datasets with a minimal performance impact on image classification and downstream tasks like object recognition. In this paper, we investigate the role of face obfuscation in video classification datasets and quantify a more significant reduction in performance caused by face blurring. To reduce such performance effects, we propose a generalized distillation approach in which a privacy-preserving action recognition network is trained with privileged information given by face identities. We show, through experiments performed on Kinetics-400, that the proposed approach can fully close the performance gap caused by face anonymization.

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[3] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[4] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Rita Cucchiara,et al. Image-to-Image Translation to Unfold the Reality of Artworks: An Empirical Analysis , 2019, ICIAP.

[7] Yonglong Tian,et al. Contrastive Representation Distillation , 2019, ICLR.

[8] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10] Jian Yang,et al. DSFD: Dual Shot Face Detector , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Hossein Mobahi,et al. Self-Distillation Amplifies Regularization in Hilbert Space , 2020, NeurIPS.

[12] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[13] Simone Calderara,et al. Video action detection by learning graph-based spatio-temporal interactions , 2021, Comput. Vis. Image Underst..

[14] Mitesh M. Khapra,et al. Efficient Video Classification Using Fewer Frames , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Bernhard Schölkopf,et al. Unifying distillation and privileged information , 2015, ICLR.

[16] Sangdoo Yun,et al. A Comprehensive Overhaul of Feature Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17] Bumsub Ham,et al. Learning with Privileged Information for Efficient Image Super-Resolution , 2020, ECCV.

[18] Kaisheng Ma,et al. Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Jinwoo Shin,et al. Regularizing Class-Wise Predictions via Self-Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Kuk-Jin Yoon,et al. Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Jiashi Feng,et al. Revisit Knowledge Distillation: a Teacher-free Framework , 2019, ArXiv.

[22] Yong Jae Lee,et al. Learning to Anonymize Faces for Privacy Preserving Action Detection , 2018, ECCV.

[23] Tao Mei,et al. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24] Rauf Izmailov,et al. Learning using privileged information: similarity control and knowledge transfer , 2015, J. Mach. Learn. Res..

[25] Dariu Gavrila,et al. Privacy Protection in Street-View Panoramas Using Depth and Multi-View Imagery , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Feng Zhou,et al. Matching Guided Distillation , 2020, ECCV.

[27] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[29] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[30] Gedas Bertasius,et al. Is Space-Time Attention All You Need for Video Understanding? , 2021, ICML.

[31] Yan Lu,et al. Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Li Fei-Fei,et al. A Study of Face Obfuscation in ImageNet , 2021, ICML.

[33] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.