NLPX - An XML-IR System with a Natural Language Interface

Traditional information retrieval (IR) systems respond to user queries with ranked lists of relevant documents. The separation of content and structure in XML documents allows individual XML elements to be selected in isolation. Thus, users expect XML-IR systems to return highly relevant results that are more precise than entire documents. This paper presents such a system. The system accepts queries in both natural language (English) and formal XPath-like format (NEXI) and matches to a set of relevant and appropriately-sized elements using an effective ranking scheme.