TermExtractor: a Web Application to Learn the Shared Terminology of Emergent Web Communities

In the Semantic Web era, many techniques have been proposed to capture the explicit knowledge of a virtual community, and represent this knowledge in a structured form often referred to as domain ontology. One of the first steps of the ontology-building task is to collect a vocabulary of domain relevant terms. We designed a high-performing technique to automatically extract the shared terminology from available documents in a given domain. This technique has been successfully experimented and submitted for large-scale evaluation in the domain of enterprise interoperability, by the member of the INTEROP network of excellence. In order to make the technique available to the members of any web community, we developed a web application that allows users to acquire (incrementally or in a single step) a terminology in any domain, by submitting documents of variable length and format, and validate on-line the obtained results. The system also supports collaborative evaluation by a group of experts. The web application has been widely tested in several domains by many international institutions that volunteered for this task.