Semantic video search using natural language queries

Recent advances in computer vision and artificial intelligence algorithms have allowed automatic extraction of metadata from video. This metadata can be represented by using the RDF/OWL ontology which can encode scene objects and their relationships in an unambiguous and well-formed manner. The encoded data can be queried using SPARQL. However, SPARQL has a steep learning curve and cannot be directly utilized by a general user for video content search. In this paper, we propose a method to bridge this gap by automatically translating user provided natural language query into an ontology-based SPARQL query for semantic video search. The proposed method consists of three major steps. First, semantically labeled training corpus of natural language query sentences is used for learning the Semantic Stochastic Context Free Grammar (SSCFG). Second, given a user provided natural language query sentence, we use the Earley-Stolcke parsing algorithm to determine the maximum likelihood semantic parsing of the query sentence. This parsing infers the semantic meaning for each word in the query sentence from which the SPARQL query is constructed. Third, the SPARQL query is executed to retrieve relevant video segments from the RDF-OWL video content database. The method is evaluated by running natural language queries on surveillance videos from maritime and land-based domains, though the framework itself is general and extensible to search videos from other domains.

[1]  Haofen Wang,et al.  Q2Semantic: A Lightweight Keyword Interface to Semantic Search , 2008, ESWC.

[2]  Chong Wang,et al.  PANTO: A Portable Natural Language Interface to Ontologies , 2007, ESWC.

[3]  Nong Sang,et al.  Bayesian Inference for Layer Representation with Mixed Markov Random Field , 2007, EMMCVPR.

[4]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Irfan Essa,et al.  Recognizing Multitasked Activities using Stochastic Context-Free Grammar , 2001 .

[7]  Peter Thanisch,et al.  MASQUE/SQL: an efficient and portable natural language query interface for relational databases , 1993 .

[8]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[9]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[10]  Abraham Bernstein,et al.  Querix: A Natural Language Interface to Query Ontologies Based on Clarification Dialogs , 2006 .

[11]  Mun Wai Lee,et al.  SAVE: A framework for semantic annotation of visual events , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[13]  Ramakant Nevatia,et al.  An Ontology for Video Event Representation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.