Human sequence evaluation: the key-frame approach

L'analisi de sequencies d'imatges on apareixen essers humans permet desenvolupar multiples aplicacions, pero tambe comporta moltes dificultats. Aquest ambit de recerca tan complexe s'anonema Human Sequence Evaluation (HSE). Un sistema HSE generic transforma dades d'imatges en descripcions d'alt nivell, i viceversa. Per a assolir aquesta abstraccio, descrivim una arquitectura modular per desenvolupar sistemes HSE, on cada modul es correspon amb un pas d'abstraccio. Les contribucions de la investigacio que es presenta a continuacio s'emmarquen dins d'aquesta arquitectura. Per aixo s'estableix una taxonomia de moviment huma que guii el disseny de models intermedis que permetin entendre els canvis produits en una escena. Aquesta taxonomia inclou el concepte d'accio, que es defineix com una sequencia predeterminada de postures humanes. En aquesta Tesi es proposa un nou model d'accions humanes que s'utilitza en aplicacions on es requereix representar el moviment huma. Les dades d'aprenentatge es corresponen amb postures humanes, on cada postura es defineix a partir d'un nou model del cos huma. Utilitzem moltes execucions d'una mateixa accio per construir un espai d'accions humanes, anomenat aSpace, on cada execucio es representa mitjancant una corba parametrica. Un cop calculada la mitjana de totes les execucions apreses, les postures mes caracteristiques de l'accio, anomenades key-frames, son seleccionades automaticament d'entre totes les postures que hi pertanyen. Els key-frames s'utilitzen per a construir el model final d'accio humana, anomenat p-action. El p-action es una corba que modelitza l'evolucio temporal de la postura del cos durant l'execucio prototipica d'una accio i s'utilitza per a implementar algorismes de reconeixement i sintesi d'accions humanes, aixi com per a analitzar execucions particulars d'accions. Aixi doncs, en primer lloc, describim un procediment de reconeixement d'accions humanes utilitzant els key-frames de cada model d'accio. En segon lloc, presentem un metode per a realitzar la sintesi d'accions humanes. Donada unicament la durada de l'accio a sintetitzar, obtenim un moviment huma suau i realista. Per a aixo, el model p-action es descriu a partir de la longitud d'arc per tal d'assolir independencia respecte a la velocitat d'execucio. A mes a mes, la representacio d'accions humanes permet modelitzar les postures que es corresponen a les transicions entre accions, sintetitzant aixi activitats. Per ultim, establim un entorn de comparacio per a analitzar les diferencies entre execucions d'una mateixa accio. En concret, utilitzem l'aSpace per a establir una caracteritzacio de l'estil de caminar a partir del genere dels agents. Per a concloure aquesta Tesi, afrontem la tasca d'incloure el nostre model d'accions humanes dins de l'entorn de treball del HSE. Per a aixo, utilitzem els Situation Graph Trees (SGTs) per modelitzar el coneixement necessari que ens permet representar el comportament huma. Adaptant el nostre model d'accio dins de la metodologia SGT, aconseguim generar descripcions conceptuals sobre el comportament d'un agent a partir de la informacio quantitativa que s'obte de sequencies d'imatges. Finalment, exemplifiquem com obtenir una descripcio del comportament huma dins d'una escena, aixi com la creacio de comportaments sintetics per a agents virtuals. The analysis of image sequences involving human agents allows to develop multiple applications, but it implies also lots of difficulties. This challenging domain is referred here as Human Sequence Evaluation (HSE). A generic HSE system transforms image data into conceptual descriptions, and vice versa. This abstraction process is addressed by describing the HSE framework as a modular scheme, each module concerned to a specific task domain. The contributions of this investigation are discussed within this framework, and a human motion taxonomy is established to reflect the minimal abstraction steps required for HSE. This taxonomy includes the action term which denotes a learnt sequence of human postures. This Thesis proposes a novel human action model used in different applications which require a representation for human movements. Several performances of a given action constitute the training data which is represented as a sequence of human postures. The learning postures are described using a novel human body model, and they are used to build a human action space, called aSpace, within which each human performance is represented as a parametric manifold. As each manifold is parameterized by the (normalized) temporal variation of the posture, the mean performance can be computed. Subsequently, the most characteristic postures for such an action, called key-frames, are selected automatically from the postures belonging to the mean performance. Key-frames are used to build the human action model, called p-action. A p-action represents the time evolution of the human body posture during the prototypical performance of a particular action, and is exploited to perform human action recognition and synthesis, and performance analysis. Firstly, we describe a human action recognition procedure by considering the key-frame set of each action model. Secondly, an algorithm for human action synthesis is presented. Realistic and smooth human motion is generated given only the temporal duration of the synthesized action. For this purpose, p-actions are parameterized by arc-length to achieve invariance to speed. Moreover, our proposed model for human actions is enhanced to represent postures corresponding to action transitions, thus allowing to synthesize human activities. Lastly, a comparison framework is established to analyse the differences between performances of the same action. Specifically, the aSpace representation is used to derive a proper characterization of the walking style in terms of the gender of the walker. To conclude this investigation, we confront the task of embedding our human action model within the HSE framework. For this purpose, Situation Graph Trees (SGTs) are used to model the knowledge required for human activity and behavior representation. By adapting our action model to the SGT methodology, we derive semantic primitives based on the quantitative information obtained from image sequences, and we also generate synthetic sequences based on the conceptual information embedded in activity and behavior models. We show examples of SGTs which infer the behavior of actors within a scene, and which generate synthetic behavior for virtual human agents.