Looking at People

There is a great need for programs that can describe what people are doing from video. This is difficult to do, because it is hard to identify and track people in video sequences, because we have no canonical vocabulary for describing what people are doing, and because the interpretation of what people are doing depends very strongly on what is nearby. Tracking is hard, because it is important to track relatively small structures that can move relatively fast for example, lower arms. I will describe research into kinematic tracking tracking that reports the kinematic configuration of the body that has resulted in a fairly accurate, fully automatic tracker, that can keep track of multiple people. Once one has tracked the body, one must interpret the results. One way to do so is to have a motion synthesis system that takes the track, and produces a motion that is (a) like a human motion and (b) close to the track. Our work has produced a high-quality motion synthesis system that can produce motions that look very much like human activities. I will describe work that couples that system with a tracker to produce a description of the activities, entirely automatically. I will speculate on some of the many open problems. What should one report? How do nearby objects affect one’s interpretation of activities? How can one interpret patterns of behavior?