Video Indexing Using Face Appearance and Shot Transition Detection

The possibility to automatically index human faces in videos could lead to a wide range of applications such as automatic video content analysis, data mining, on-demand streaming, etc. Most relevant works in the literature gather full indexing of videos in real scenarios by exploiting additional media features (e.g. audio and text) that are fused with facial appearance information to make the whole frameworks accurate and robust. Anyway, there exist some application contexts where multimedia data are either not available or reliable and for which available solutions are not well suited. This paper tries to explore this challenging research path by introducing a new fully computer vision based video indexing pipeline. The system has been validated and tested in two different typical scenarios where no-multimedia data could be exploited: broadcasted political video documentaries and healthcare therapies sessions about non-verbal skills.

[1]  Haibin Ling,et al.  Attention guided deep audio-face fusion for efficient speaker naming , 2019, Pattern Recognit..

[2]  Dario Cazzato,et al.  A complete framework for fully-automatic people indexing in generic videos , 2014, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[3]  Richard I. Hartley,et al.  Person Reidentification Using Spatiotemporal Appearance , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Xiang Li,et al.  Top-Push Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ramakant Nevatia,et al.  Face and Body Association for Video-Based Face Recognition , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Shaogang Gong,et al.  Person Re-identification by Video Ranking , 2014, ECCV.

[7]  Yunde Jia,et al.  Deep CNN based binary hash video representations for face retrieval , 2018, Pattern Recognit..

[8]  Santiago Figueira,et al.  Bisimulations on Data Graphs , 2016, KR.

[9]  T. Venugopal,et al.  Content-Based Video Indexing and Retrieval using Key frames Texture, Edge and Motion Features , 2016 .

[10]  Qi Tian,et al.  MARS: A Video Benchmark for Large-Scale Person Re-Identification , 2016, ECCV.

[11]  Gregory Gelly,et al.  Improving Speaker Diarization of TV Series using Talking-Face Detection and Clustering , 2016, ACM Multimedia.

[12]  Konstantinos Bougiatiotis,et al.  Enhanced movie content similarity based on textual, auditory and visual information , 2017, Expert Syst. Appl..

[13]  Stuart J. Russell,et al.  Object identification in a Bayesian context , 1997, IJCAI 1997.

[14]  Theodoros Giannakopoulos,et al.  Audio-visual speaker diarization using fisher linear semi-discriminant analysis , 2014, Multimedia Tools and Applications.

[15]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16]  Chuohao Yeo,et al.  Multi-modal speaker diarization of real-world meetings using compressed-domain video features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Hassan Silkan,et al.  Efficient indexing and similarity search using the Geometric Near-neighbor Access Tree (GNAT) for Face-Images Data , 2019, Procedia Computer Science.

[18]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[19]  Jiang Yu Zheng,et al.  Temporal mapping of surveillance video for indexing and summarization , 2016, Comput. Vis. Image Underst..

[20]  Javier Lorenzo-Navarro,et al.  A multimedia system to produce and deliver video fragments on demand on parliamentary websites , 2017, Multimedia Tools and Applications.

[21]  Quan Wang,et al.  Attention-Based Models for Text-Dependent Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[23]  Jiri Prinosil,et al.  Blind face indexing in video , 2011, 2011 34th International Conference on Telecommunications and Signal Processing (TSP).

[24]  Tong Liu,et al.  Facial Peculiarity Retrieval via Deep Neural Networks Fusion , 2018, Int. J. Comput. Intell. Syst..

[25]  Duy-Dinh Le,et al.  Face Retrieval in Large-Scale News Video Datasets , 2013, IEICE Trans. Inf. Syst..

[26]  Rita Cucchiara,et al.  M-VAD names: a dataset for video captioning with naming , 2018, Multimedia Tools and Applications.

[27]  Parag Kulkarni,et al.  An effective content based video analysis and retrieval using pattern indexing techniques , 2015, 2015 International Conference on Industrial Instrumentation and Control (ICIC).

[28]  Sung Wook Baik,et al.  DeepStar: Detecting Starring Characters in Movies , 2019, IEEE Access.

[29]  Yi Yang,et al.  Person Re-identification: Past, Present and Future , 2016, ArXiv.

[30]  Pierluigi Carcagnì,et al.  Computational Assessment of Facial Expression Production in ASD Children , 2018, Sensors.

[31]  Rémi Ronfard,et al.  Detecting and Naming Actors in Movies Using Generative Appearance Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Yong Man Ro,et al.  Face annotation for online personal videos using color feature fusion based face recognition , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[33]  Xiang Zhu,et al.  Supervised deep hashing for scalable face image retrieval , 2018, Pattern Recognit..

[34]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Radu Horaud,et al.  Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Slawomir Bak,et al.  Person Re-identification Using Haar-based and DCD-based Signature , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[37]  Sergio A. Velastin,et al.  Re-identification of Pedestrians in Crowds Using Dynamic Time Warping , 2012, ECCV Workshops.

[38]  Alessandro Perina,et al.  Person re-identification by symmetry-driven accumulation of local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .