Understanding and modeling user interests in consumer videos

The paper analyzes the interests of users in viewing and organizing consumer videos. It proposes a taxonomy of relevant concepts with three basic dimensions of interests (DOIs) and effective models to predict the user interests in each dimension. The three DOIs correspond to the objects, the scenes and the events. Our conclusions are backed with an extensive study, in which users were asked to annotate and score the importance of each DOI in short clips of diverse and real consumer videos. Analysis of the user study data reveals high consistency (70%) of the scores across different users and independence between objects and events. In addition, we show how heuristic rules and neural networks can accurately predict these scores using camera motion, foreground object and audio information. The automatic and effective prediction of user interests has the potential for improving applications for annotating and summarizing consumer videos