Classification of human actions using face and hands detection

In this paper, we describe a novel classification technique that separates video scenes, like office work tasks, into several scenes according to each task. Even if the difference of as a whole image frame by frame in each task is small, the difference of worker's movement is quite big due to the position of face and hands according to each task. In addition, the worker has the tendency to turn his/her face to look at the particular objects of individual tasks like PC, a document, and so on. Then, we decide to separate tasks based on face position, face angle in depth, and hand positions. For comparison of frames in a video, we use the Maharanobis distance to measure the difference of multivariate data that consist of face coordinates, face angle in depth, and coordinates of both hands. For the separation of tasks by the Maharanobis distance of an each frame, we use the hierarchical clustering method to classify frames in a video according to each task. For the robust detection of both hands, we use color-based method that searches hand areas using face color. Although the color of hands changes corresponding to the lighting conditions, the color of hands should be very similar to that of the face in an office. Therefore, even when the lighting condition changes, our color-based hand detection method does not need any adjustment to the change. We apply this classification technique to separate office work video into individual sets of task scenes. As a result, our technique shows better task separation performance than the histogram-based boundary detection technique.

[1]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[2]  Ramesh C. Jain,et al.  Dynamic vision , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[3]  Fionn Murtagh,et al.  Multidimensional clustering algorithms , 1985 .

[4]  Takafumi Miyatake,et al.  IMPACT: an interactive natural-motion-picture dedicated multimedia authoring system , 1991, CHI.

[5]  N. Kato,et al.  An analysis-synthesis loop model using kernel method , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[6]  Peter F. Drucker,et al.  Landmarks of Tomorrow: A Report on the New "Post-Modern" World , 1996 .

[7]  Gary Bradski,et al.  Computer Vision Face Tracking For Use in a Perceptual User Interface , 1998 .

[8]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.