Temporal relation algebra for audiovisual content analysis

The context of this work is to characterize the content and the structure of audiovisual documents by analysing the temporal relationships between basic events resulted from different segmentations of the same document. For this objective, we need to represent and reason about time. We propose a parametric representation of temporal relation between segments (points or intervals) in which the parameters are used to characterize the relationship between two non-convex intervals corresponding to two segmentations in the video analysis domain. The relationship is represented by a co-occurrences matrix noted as Temporal Relation Matrix (TRM). Each document is represented by a set of TRMs computed between each couple of segmentations of the same document using different features. The TRMs are analysed later to detect semantic events, highlight clues about the video content structure or to classify documents based on their types. For higher-level semantic events and documents’ structure, we needed to apply some operations on the basic temporal relations and TRMs such as composition, disjunction, complement, intersection, etc. These operations brought to light more complex patterns; e.g. event 1 occurs at the same time of event 2 followed by event 3. In the work presented in this paper, we define a temporal relation algebra including its set of operations based on the parametric representation and TRM defined above. Several experimentations have been done on different audio and video documents to show the efficiency of the proposed representation and the defined operations for audiovisual content analysing.

[1]  Qi Tian,et al.  A unified framework for semantic shot classification in sports video , 2005, IEEE Trans. Multim..

[2]  Bernard Moulin,et al.  Conceptual-graph approach for the representation of temporal information in discourse , 1992, Knowl. Based Syst..

[3]  Chengqi Zhang,et al.  Propagating temporal relations of intervals by matrix , 2002, Appl. Artif. Intell..

[4]  Henry A. Kautz,et al.  Constraint propagation algorithms for temporal reasoning: a revised report , 1989 .

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Itay Meiri,et al.  Combining Qualitative and Quantitative Constraints in Temporal Reasoning , 1991, Artif. Intell..

[7]  Qi Wang,et al.  A Joint Convolutional Neural Networks and Context Transfer for Street Scenes Labeling , 2018, IEEE Transactions on Intelligent Transportation Systems.

[8]  Abdul Sattar,et al.  A New Framework for Reasoning about Points, Intervals and Durations , 1999, IJCAI.

[9]  Patrick J. Hayes,et al.  A Catalog of Temporal Theories , 2005 .

[10]  Shohreh Kasaei,et al.  Event Detection and Summarization in Soccer Videos Using Bayesian Network and Copula , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Ron Shamir,et al.  Complexity and algorithms for reasoning about time: a graph-theoretic approach , 1993, JACM.

[12]  G. P. Bhattacharjee,et al.  Temporal representation and reasoning in artificial intelligence: A review , 2001 .

[13]  Henry A. Kautz,et al.  Constraint Propagation Algorithms for Temporal Reasoning , 1986, AAAI.

[14]  Jean-François Condotta,et al.  Problemes de satisfaction de contraintes : algorithmes et complexite , 2000 .

[15]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[16]  R. McKenzie,et al.  The logic of time representation , 1987 .

[17]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Mei Han,et al.  An integrated baseball digest system using maximum entropy method , 2002, MULTIMEDIA '02.

[19]  Christian Freksa,et al.  Temporal Reasoning Based on Semi-Intervals , 1992, Artif. Intell..

[20]  R. Leonardi,et al.  EXPLOITATION OF TEMPORAL DEPENDENCIES OF DESCRIPTORS TO EXTRACT SEMANTIC INFORMATION , 2001 .

[21]  Angelo Montanari,et al.  Trends in temporal representation and reasoning , 1996, The Knowledge Engineering Review.

[22]  Abdul Sattar,et al.  INDU: An Interval and Duration Network , 1999, Australian Joint Conference on Artificial Intelligence.

[23]  Gérard Ligozat,et al.  On Relations Between Intervals , 1989, Inf. Process. Lett..

[24]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[25]  Philippe Joly,et al.  A Similarity-Based Approach for Audiovisual Document Classification Using Temporal Relation Analysis , 2011, EURASIP J. Image Video Process..

[26]  Rina Dechter,et al.  Temporal Constraint Networks , 1989, Artif. Intell..

[27]  James P. Delgrande,et al.  A Theory for Convex Interval Relations including Unbounded Intervals , 2004, FLAIRS.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Jean-François Condotta,et al.  Reasoning about Cyclic Space: Axiomatic and Computational Aspects , 2003, Spatial Cognition.

[30]  Guohui Zhang,et al.  Learning Convolutional Ranking-Score Function by Query Preference Regularization , 2017, IDEAL.

[31]  Bernhard Nebel,et al.  Reasoning about temporal relations: a maximal tractable subclass of Allen's interval algebra , 1994, JACM.

[32]  Tao Mei,et al.  Deep Quantization: Encoding Convolutional Activations with Deep Generative Model , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  C.-C. Jay Kuo,et al.  Rule-based video classification system for basketball video indexing , 2000, MULTIMEDIA '00.

[34]  Qi Wang,et al.  Deep Metric Learning for Crowdedness Regression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Angelo Montanari,et al.  Temporal representation and reasoning in artificial intelligence: Issues and approaches , 2000, Annals of Mathematics and Artificial Intelligence.

[36]  Shih-Fu Chang,et al.  Structure analysis of soccer video with hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  Byeong-Seob Ko,et al.  Sports highlights generation bas ed on acoustic events detection: A rugby case study , 2015, 2015 IEEE International Conference on Consumer Electronics (ICCE).

[38]  Abdul Sattar,et al.  Temporal Reasoning with Qualitative and Quantitative Information about Points and Durations , 1998, AAAI/IAAI.

[39]  Eddie Schwalb,et al.  Temporal Constraints: A Survey , 1998, Constraints.

[40]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Andrzej Duda,et al.  Structured Temporal Composition of Multimedia Data , 1995, Proceedings. International Workshop on Multi-Media Database Management Systems.

[42]  Peter B. Ladkin,et al.  Time Representation: A Taxonomy of Internal Relations , 1986, AAAI.

[43]  Guohui Zhang,et al.  A Novel Image Tag Completion Method Based on Convolutional Neural Transformation , 2017, ICANN.

[44]  Milan Petkovic,et al.  Multi-modal extraction of highlights from TV Formula 1 programs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[45]  Yannis Avrithis,et al.  Broadcast news parsing using visual cues: a robust face detection approach , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[46]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[47]  Lluís Vila,et al.  A Survey on Temporal Reasoning in Artificial Intelligence , 1994, AI Communications.

[48]  Henry A. Kautz,et al.  Integrating Metric and Qualitative Temporal Reasoning , 1991, AAAI.

[49]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[51]  Julien Pinquier,et al.  Detecting individual role using features extracted from speaker diarization results , 2010, Multimedia Tools and Applications.

[52]  Nitish Srivastava,et al.  Exploiting Image-trained CNN Architectures for Unconstrained Video Classification , 2015, BMVC.

[53]  Zein Al Abidin Ibrahim,et al.  Caractérisation des structures audiovisuelles par analyse statistique des relations temporelles , 2007 .

[54]  Alfred Tarski,et al.  Relational selves as self-affirmational resources , 2008 .

[55]  Min Zhi,et al.  Summary generation method based on audio feature , 2015, 2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS).

[56]  Roque Marín,et al.  Qualitative Temporal Reasoning with Points and Durations , 1997, IJCAI.

[57]  Peter van Beek,et al.  Exact and approximate reasoning about temporal relations 1 , 1990, Comput. Intell..

[58]  Stefan Eickeler,et al.  Content-based video indexing of TV broadcast news using hidden Markov models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[59]  Peter Jeavons,et al.  Reasoning about temporal relations: The tractable subalgebras of Allen's interval algebra , 2003, JACM.

[60]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[61]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[62]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[63]  Tetsuya Takiguchi,et al.  3D tracking of soccer players using time-situation graph in monocular image sequence , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[64]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[65]  Polle T. Zellweger,et al.  Automatic temporal layout mechanisms , 2001 .

[66]  Richard J. Qian,et al.  Detecting semantic events in soccer games: towards a complete solution , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[67]  Gérard Ligozat,et al.  On Generalized Interval Calculi , 1991, AAAI.

[68]  Wei Xu,et al.  Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Dennis Dingeldein,et al.  Modeling Multimedia-Objects with MME , 1995, Workshop on Object-Oriented Graphics.

[70]  Marc B. Vilain,et al.  A System for Reasoning About Time , 1982, AAAI.