Ontology-driven Information Retrieval in FF-Poirot

This paper proposes a new approach for supporting domainspecific information retrieval and information extraction from the Web, that uses a query expansion technique based on an ad-hoc ontology. The system has been built and tested in the framework of the FF-Poirot project, for supporting fine-grain retrieval from the Internet aiming at detecting financial fraudulent sites. In a first stage, using a short list of keywords given by the user, the application mines the Web and retrieves relevant documents. These documents are then clustered into coherent groups focusing on specific subjects. The ontology model is devoted to represent the most important concepts of the domain of interest and to link them to the user need, as expressed lexically by his keywords. Once clusters of documents are made available after the first stage, the ontology can be used to extract the most interesting documents (i.e. the ones likely to be the fraudulent target sites in the FF-Poirot application). By browsing the ontology and selecting specific concepts, the user can trigger a query expansion process that refines the search: a new query is created embodying the terminological evidences tied to the selected concepts. The paper describes the overall software architecture of the application as used in the project, focusing specifically on the query expansion engine and the supporting ontological model adopted.