Populating a Domain Ontology from a Web Biographical Dictionary of Music - An Unsupervised Rule-based Method to Handle Brazilian Portuguese Texts

An increasing amount of information is available on the web and usually is expressed as text, representing unstructured or semi-structured data. Semantic information is implicit in these texts, since they are mainly intended for human consumption and interpretation. Since unstructured information is not easily handled automatically, an information extraction process has to be used to identify concepts and establish relations among them. Information extraction outcome can be represented as a domain ontology. Ontologies are an appropriate way to represent structured knowledge bases, enabling sharing, reuse and inference. In this paper, an information extraction process is used for populating a domain ontology. It targets Brazilian Portuguese texts from a biographical dictionary of music, which requires specific tools due to some language unique aspects. An unsupervised rule-based method is proposed. Through this process, latent concepts and relations expressed in natural language can be extracted and represented as an ontology, allowing new uses and visualizations of the content, such as semantically browsing and inferring new

[1]  J W Ballard,et al.  Data on the web? , 1995, Science.

[2]  James F. Allen Time and time again: The many ways to represent time , 1991, Int. J. Intell. Syst..

[3]  S. Miksch,et al.  Information Extraction A Survey , 2005 .

[4]  Bernardo Magnini,et al.  Weakly Supervised Approaches for Ontology Population , 2008, EACL.

[5]  Lucia Helena Machado Rino,et al.  The Mitkov Algorithm for Anaphora Resolution in Portuguese , 2008, PROPOR.

[6]  D. A. Quan,et al.  How to make a semantic web browser , 2004, WWW '04.

[7]  Ling Liu,et al.  Ontology to appear in the Encyclopedia of Database Systems , 2022 .

[8]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[9]  Silvia Miksch,et al.  Motivating Ontology-Driven Information Extraction , 2011 .

[10]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Eric Laporte,et al.  UNITEX-PB, a set of flexible language resources for Brazilian Portuguese , 2005 .

[12]  Marie-Francine Moens,et al.  Information Extraction: Algorithms and Prospects in a Retrieval Context , 2006, The Information Retrieval Series.

[13]  Inderjeet Mani,et al.  Temporal Granularity and Temporal Tagging of Text , 2000 .

[14]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[15]  Johanna Völker,et al.  Towards large-scale, open-domain and ontology-based named entity classification , 2005 .

[16]  Volker Haarslev,et al.  Racer: An OWL Reasoning Agent for the Semantic Web , 2003 .

[17]  António Branco,et al.  A Suite of Shallow Processing Tools for Portuguese: LX-Suite , 2006, EACL.

[18]  João Graça,et al.  A Framework for Integrating Natural Language Tools , 2006, PROPOR.

[19]  Jorge S. Cardoso The Semantic Web Vision: Where Are We? , 2007, IEEE Intelligent Systems.

[20]  Johanna Völker,et al.  Ontology Learning and Reasoning - Dealing with Uncertainty and Inconsistency , 2005, ISWC-URSW.