Syntactic Information Retrieval

Natural language processing (NLP) techniques are believed to have the potential to aid information retrieval (IR) in terms of retrieval accuracy. In this paper we report a proof of concept study on a new approach to NLP-based IR that we propose. Documents and queries are represented as syntactic parse trees, which are generated by a natural language parser. Based on this tree structured representation of documents and queries, the matching between a document and a query is executed on their tree representations, with tree comparison as the key operation. An IR experiment is designed to test if this approach is feasible. Experimental results show that this approach is promising and has the potential to outperform the standard bag of words approach to information retrieval, especially in response to long queries.