A natural language system for retrieval of captioned images

ANVIL is an information retrieval system using natural language processing techniques, intended for retrieval of captioned images. It extracts dependency structures from the image captions and user queries, and then applies a high accuracy matching algorithm which recursively explores the dependency structures to determine their similarity. A further algorithm allows additional contextual information to be extracted following a successful match, with the intention of helping users understand and organise the retrieval results. ANVIL was developed to high engineering standards, and as well as looking at the research aspects of the system, we also look at some of the design and development issues. English and Japanese versions of the system have been developed.

[1]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[2]  Amanda Clare,et al.  ANVIL: a System for the Retrieval of Captioned Images using NLP Techniques , 2001 .

[3]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[4]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[5]  Tomek Strzalkowski,et al.  Robust Text Processing in Automated Information Retrieval , 1994, ANLP.

[6]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[7]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[8]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[9]  Annabel Pollock,et al.  What''s Wrong with Internet Searching , 1997 .

[10]  David Elworthy Automatic Error Detection in Part of Speech Tagging , 1994, ArXiv.

[11]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[12]  Sharon Flank,et al.  A Layered Approach to NLP-Based Information Retrieval , 1998, ACL.

[13]  Christoph Schwarz Automatic syntactic analysis of free text , 1990, J. Am. Soc. Inf. Sci..

[14]  Mehryar Mohri,et al.  On some applications of finite-state automata theory to natural language processing , 1996, Nat. Lang. Eng..

[15]  Karen Sparck Jones What is the Role of NLP in Text Retrieval , 1999 .

[16]  Alan F. Smeaton,et al.  Experiments on using semantic distances between words in image caption retrieval , 1996, SIGIR '96.

[17]  Gregory Grefenstette Short Query Linguistic Expansion Techniques: Palliating One-Word Queries by Providing Intermediate Structure to Text , 1997, SCIE.

[18]  Alan F. Smeaton,et al.  Using NLP or NLP Resources for Information Retrieval Tasks , 1999 .

[19]  Gregory Grefenstette SQLET : Short Query Linguistic Expansion Techniques: Palliating One or Two-word Queries by Providing Intermediate Structure to WWW Pages , 1997, RIAO.

[20]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[21]  Yves Schabes,et al.  Deterministic Part-of-Speech Tagging with Finite-State Transducers , 1995, Comput. Linguistics.