Integrating video metadata into 3D models

A moving object that has been detected within a field of view environment of a 2D data feed of a calibrated video camera and tracks is represented by a 3D model by locating a centroid of the object and determining an interface with a basic level within the field of view environment. A suitable grid mesh-based 3D volume model of the object is initialized by using a back-projection of a corresponding 2D image as a function of the centroid and the determined interface with the ground plane. The non-linear dynamics of a tracked motion path of the object is represented as a collection of different local linear models. A structure of the object is projected onto the 3D model and 2D traces of the object is extended to 3D movements to operate the 3D model, in one aspect, by learning a weighted combination of the different local linear models, an image defect due to a re-projection minimized with respect to a model movement.