A Methodology to Discover Semantic Features from Textual Resources

Data analysis algorithms focused on processing textual data rely on the extraction of relevant features from text and the appropriate association to their formal semantics. In this paper, a method to assist this task, annotating extracted textual features with concepts from a background ontology, is presented. The method is automatic and unsupervised and it has been designed in a generic way, so it can be applied to textual resources ranging from plain text to semi-structured resources (like Wikipedia articles). The system has been tested with tourist destinations and Wikipedia articles showing promising results.

[1]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[2]  Steffen Staab,et al.  Towards the self-annotating web , 2004, WWW '04.

[3]  Oren Etzioni,et al.  Self-supervised Relation Extraction from the Web , 2006, ISMIS.

[4]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5]  Ramanathan V. Guha,et al.  A case for automated large-scale semantic annotation , 2003, J. Web Semant..

[6]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[7]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[8]  Alexiei Dingli,et al.  User-System Cooperation in Document Annotation Based on Information Extraction , 2002, EKAW.

[9]  Marja-Riitta Koivunen Annotea and Semantic Web Supported Collaboration , 2005 .

[10]  Marc Ehrig,et al.  Knowledge Extraction from Classification Schemas , 2004, CoopIS/DOA/ODBASE.

[11]  David Sánchez,et al.  Semantic Clustering Using Multiple Ontologies , 2010, CCIA.

[12]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[13]  Andrei Mikheev,et al.  A Workbench for Finding Structure in Texts , 1997, ANLP.

[14]  Steffen Staab,et al.  Gimme' the context: context-driven automatic semantic annotation with C-PANKOW , 2005, WWW '05.

[15]  David Sánchez,et al.  Content annotation for the semantic web: an automatic web-based approach , 2011, Knowledge and Information Systems.

[16]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[17]  David Sánchez,et al.  Ontology-Based Feature Extraction , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[18]  Marius Pasca,et al.  Acquisition of categorized named entities for web search , 2004, CIKM '04.

[19]  Kenneth Ward Church,et al.  Using Statistics in Lexical Analysis , 2003, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon.