Ontology Building using Parallel Enumerative Structures

The semantics of a text is carried by both the natural language it contains and its layout. As ontology building processes have so far taken only plain text into consideration, our aim is to elicit its textual structure. We focus here on parallel enumerative structures because they bear implicit or explicit hierarchical relations, they have salient visual properties, and they are frequently found in corpora. We have defined a process which identifies them in a text, translates them into ontology structures and finally links such structures to the concepts of an existing ontology. We have assessed this process on Wikipedia encyclopaedic articles as they are rich in definitions and statements, and contain many enumerations. The many ontology structures we have obtained are thus used to enrich an ontology which we had automatically built from database specification documents.

[1]  Qiang Yang,et al.  Noise reduction through summarization for Web-page classification , 2007, Inf. Process. Manag..

[2]  Elsevier Sdol International Journal of Human-Computer Studies , 2009 .

[3]  Mouna Kamel How can document structure improve ontology learning ? , 2009 .

[4]  Jacques Virbel,et al.  Le modèle d'architecture textuelle : fondements et expérimentation , 2001 .

[5]  Aurélie Herbelot,et al.  Acquiring Ontological Relationships from Wikipedia Using RMRS , 2006 .

[6]  Wolfgang Nejdl,et al.  Extracting Semantics Relationships between Wikipedia Categories , 2006, SemWiki.

[7]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[8]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[9]  Donia Scott,et al.  Document Structure , 2003, CL.

[10]  Gang Wang,et al.  Enhancing Relation Extraction by Eliciting Selectional Constraint Features from Wikipedia , 2007, NLDB.

[11]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[12]  Simonetta Montemagni,et al.  Combining Statistical Techniques and Lexico-syntactic Patterns for Semantic Relations Extraction from Text , 2008, SWAP.

[13]  Mitsuru Ishizuka,et al.  Relation Extraction from Wikipedia Using Subtree Mining , 2007, AAAI.

[14]  Ian H. Witten,et al.  Mining Meaning from Wikipedia , 2008, Int. J. Hum. Comput. Stud..

[15]  William C. Mann,et al.  Rhetorical structure theory and text analysis , 1989 .

[16]  Christophe Luc Une typologie des énumérations basée sur les structures rhétoriques et architecturales du texte , 2001, JEPTALNRECITAL.

[17]  Siegfried Handschuh,et al.  SALT - Semantically Annotated LaTeX for scientific publications , 2007 .