Ontology and taxonomy collaborated framework for meeting classification

A framework for classification of meeting videos is proposed in this paper. We define our framework consisting of a four level concept hierarchy having movements, events, behavior, and genre; which is based on the meeting ontology and taxonomy. Ontology is the formal specification of domain concepts and their relationships. Taxonomy is the general categorization based on class/subclass relationships. This concept hierarchy is mapped to an implementation of finite state machines (FSM) and rule-based system (RBS) to classify the meetings. Events are detected by the FSMs based on the movements (head and hand tracks). Classification of the meetings is performed by the RBS based on the events, and behaviors of the people present in the meetings. Our framework is novel and scalable, capable of adding new meeting types with no re-training. We conducted experiments on various meeting sequences and classified meetings into voting, argument, presentation, and object passing. This framework has applications in automated video surveillance, video segmentation and retrieval (multimedia), human computer interaction, and augmented reality.

[1]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Michael Isard,et al.  A mixed-state condensation tracker with automatic model-switching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[4]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[5]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[6]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[7]  Jake K. Aggarwal,et al.  Human motion analysis: a review , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[8]  Aaron F. Bobick,et al.  A Framework for Recognizing Multi-Agent Action from Visual Evidence , 1999, AAAI/IAAI.

[9]  Ramakant Nevatia,et al.  Multi-agent event recognition , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Mubarak Shah,et al.  Monitoring human behavior from video taken in an office environment , 2001, Image Vis. Comput..

[11]  Aaron F. Bobick,et al.  Closed-world tracking , 1995, Proceedings of IEEE International Conference on Computer Vision.

[12]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[13]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Kunio Fukunaga,et al.  Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions , 2002, International Journal of Computer Vision.

[15]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  Aaron F. Bobick,et al.  Realtime online adaptive gesture recognition , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).