Augmented Transition Network as a Semantic Model for Video Data

An abstract semantic model called the augmented transition network (ATN), which can model video data and user interactions, is proposed in this paper. An ATN and its subnetworks can model video data based on different granularities such as scenes, shots and key frames. Multimedia input strings are used as inputs for ATNs. Key frame selection is based on temporal and spatial relations of semantic objects in each shot. The relations of semantic objects are captured from our proposed unsupervised video segmentation method which considers the problem of partitioning each frame as a joint estimation of the partition and class parameter variables. Unlike existing semantic models which only model multimedia presentation, multimedia database searching, or browsing, ATNs together with multimedia input strings can model these three in one framework. RESUME. Dans cet article est presente un modele semantique theorique appele ATN (augmented transition network) qui est capable de generer des donnees video et de creer des interactions avec l’utilisateur. Un ATN et ses sous-reseaux peuvent developper des bases de donnees video basees sur differents objets tel que des scenes, des images ou des “key frames.” Les chaines de donnees multimedia sont utilisees comme sources d’entrees par les ATN. La selection des "key frames" est elle basee sur une relation temporel et spatiale des objets semantiques. Cette meme relation est aquise par notre methode de segmentation video autonome proposee, laquelle se charge de partitioner chaque image en creeant une estimation jointe des variables de parametres de classes et de la partition en elle meme. Contrairement aux modeles semantiques existant qui se contentent de generer soit une presentation multimedia, soit une base de donnees de recherche multimedia, ou tout simplement un affichage (appele egalement “browsing”), les ATN combines a des donnee d’entree multimedia peuvent generer ces trois derniers a la fois en un unique cadre de travaille.

[1]  Rangasami L. Kashyap,et al.  Bayesian estimation for multiscale image segmentation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  Rangasami L. Kashyap,et al.  Empirical studies of multimedia semantic models for multimedia presentations , 1998, Computers and Their Applications.

[3]  Arif Ghafoor,et al.  Synchronization and Storage Models for Multimedia Objects , 1990, IEEE J. Sel. Areas Commun..

[4]  A. Murat Tekalp,et al.  Object-based indexing of MPEG-4 compressed video , 1997, Electronic Imaging.

[5]  Anita Komlodi,et al.  Visual video browsing interfaces using key frames , 1998, CHI Conference Summary.

[6]  R. Kashyap,et al.  UNSUPERVISED CLASSIFICATION AND CHOICE OF CLASSES: BAYESIAN APPROACH , 1998 .

[7]  Michael Mills,et al.  A magnifier tool for video data , 1992, CHI.

[8]  Remi Depommier,et al.  Content-based browsing of video sequences , 1994, MULTIMEDIA '94.

[9]  Arif Ghafoor,et al.  Object-oriented conceptual modeling of video data , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  Katsumi Tanaka,et al.  OVID: Design and Implementation of a Video-Object Database System , 1993, IEEE Trans. Knowl. Data Eng..

[11]  William A. Woods,et al.  Computational Linguistics Transition Network Grammars for Natural Language Analysis , 2022 .

[12]  Rangasami L. Kashyap,et al.  Unsupervised video segmentation and object tracking , 2000 .

[13]  Ulrich Thiel,et al.  Concept-based browsing in video libraries , 1999, Proceedings IEEE Forum on Research and Technology Advances in Digital Libraries.

[14]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[15]  Jonathan D. Courtney Automatic video indexing via object motion analysis , 1997, Pattern Recognit..

[16]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.