A Human-Centered Multiple Instance Learning Framework for Semantic Video Retrieval

This paper proposes a human-centered interactive framework for automatically mining and retrieving semantic events in videos. After preprocessing, the object trajectories and event models are fed into the core components of the framework for learning and retrieval. As trajectories are spatiotemporal in nature, the learning component is designed to analyze time series data. The human feedback to the retrieval results provides progressive guidance for the retrieval component in the framework. The retrieval results are in the form of video sequences instead of contained trajectories for user convenience. Thus, the trajectories are not directly labeled by the feedback as required by the training algorithm. A mapping between semantic video retrieval and multiple instance learning (MIL) is established in order to solve this problem. The effectiveness of the algorithm is demonstrated by experiments on real-life transportation surveillance videos.

[1]  Ian D. Reid,et al.  Behaviour understanding in video: a combined method , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2]  Thomas S. Huang,et al.  Content-based image retrieval with relevance feedback in MARS , 1997, Proceedings of International Conference on Image Processing.

[3]  Shehzad Khalid,et al.  Classifying spatiotemporal object trajectories using unsupervised learning in the coefficient feature space , 2006, Multimedia Systems.

[4]  Yohsuke Kinouchi,et al.  Neural networks for event extraction from time series: a back propagation algorithm approach , 2005, Future Gener. Comput. Syst..

[5]  Jan Ramon,et al.  Multi instance neural networks , 2000, ICML 2000.

[6]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[7]  Jitendra Malik,et al.  Automatic Symbolic Traffic Scene Analysis Using Belief Networks , 1994, AAAI.

[8]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[9]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[10]  Willem Jonker,et al.  Content-based video retrieval by integrating spatio-temporal and stochastic recognition of events , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[11]  Eric Bruno,et al.  Unsupervised event discrimination based on nonlinear temporal modeling of activity content , 2005, Pattern Analysis and Applications.

[12]  Chengcui Zhang,et al.  An Interactive Semantic Video Mining and Retrieval Platform--Application in Transportation Surveillance Video for Incident Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[13]  Yann Chevaleyre,et al.  Solving Multiple-Instance and Multiple-Part Learning Problems with Decision Trees and Rule Sets. Application to the Mutagenesis Problem , 2001, Canadian Conference on AI.

[14]  Ramakant Nevatia,et al.  Event Detection and Analysis from Video Streams , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  James T. Kwok,et al.  A regularization framework for multiple-instance learning , 2006, ICML.

[16]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[17]  Wei-bang Chen,et al.  A Multiple Instance Learning Framework for Incident Retrieval in Transportation Surveillance Video Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[18]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[19]  Neil Davey,et al.  Time Series Prediction and Neural Networks , 2001, J. Intell. Robotic Syst..

[20]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[21]  Christos Faloutsos,et al.  MindReader: Querying Databases Through Multiple Examples , 1998, VLDB.

[22]  Chi-Chung Lam,et al.  FINANCIAL TIME SERIES FORECASTING BY NEURAL NETWORK USING CONJUGATE GRADIENT LEARNING ALGORITHM AND MULTIPLE LINEAR REGRESSION WEIGHT INITIALIZATION , 2000 .

[23]  Chengcui Zhang,et al.  Learning-based spatio-temporal vehicle tracking and indexing for transportation multimedia database systems , 2003, IEEE Trans. Intell. Transp. Syst..

[24]  Shaoping Ma,et al.  Relevance feedback in content-based image retrieval: Bayesian framework, feature subspaces, and progressive learning , 2003, IEEE Trans. Image Process..

[25]  J. Faraway,et al.  Time series forecasting with neural networks: a comparative study using the air line data , 2008 .

[26]  Christine L. Tsien Event discovery in medical time-series data , 2000, AMIA.