Learning non-taxonomic relationships from web documents for domain ontology construction

In recent years, much effort has been put in ontology learning. However, the knowledge acquisition process is typically focused in the taxonomic aspect. The discovery of non-taxonomic relationships is often neglected, even though it is a fundamental point in structuring domain knowledge. This paper presents an automatic and unsupervised methodology that addresses the non-taxonomic learning process for constructing domain ontologies. It is able to discover domain-related verbs, extract non-taxonomically related concepts and label relationships, using the Web as corpus. The paper also discusses how the obtained relationships can be automatically evaluated against WordNet and presents encouraging results for several domains.

[1]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[2]  Marius Pasca,et al.  Finding Instance Names and Alternative Glosses on the Web: WordNet Reloaded , 2005, CICLing.

[3]  Andreas Wagner,et al.  Enriching a lexical semantic net with selectional preferences by means of statistical corpus analysis , 2000, ECAI Workshop on Ontology Learning.

[4]  Philipp Cimiano,et al.  Automatically Learning Qualia Structures from the Web , 2005, ACL 2005.

[5]  Petia Radeva,et al.  Artificial Intelligence Research and Development , 2005 .

[6]  Karina Gibert,et al.  Towards Binding Spanish Senses to Wordnet Senses through Taxonomy Alignment , 2004 .

[7]  Dan I. Moldovan,et al.  Text Mining for Causal Relations , 2002, FLAIRS.

[8]  Marc Ehrig,et al.  Knowledge Extraction from Classification Schemas , 2004, CoopIS/DOA/ODBASE.

[9]  Peter Spyns,et al.  Discovering Knowledge in Texts for the learning of DOGMA-inspired ontologies , 2004 .

[10]  Eric Brill Processing Natural Language without Natural Language Processing , 2003, CICLing.

[11]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[12]  Vojtech Svátek,et al.  Discovery of Lexical Entries for Non-taxonomic Relations in Ontology Learning , 2004, SOFSEM.

[13]  Yael Ravin,et al.  Identifying and extracting relations from text , 1999 .

[14]  Peter Wiemer-Hastings,et al.  Inferring the Meaning of Verbs from Context , 1999 .

[15]  Jozo J. Dujmovic,et al.  Evaluation and comparison of search engines using the LSP method , 2006, Comput. Sci. Inf. Syst..

[16]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[17]  David Sánchez,et al.  Automatic discovery of synonyms and lexicalizations from the Web , 2005, CCIA.

[18]  David Sánchez,et al.  A methodology for knowledge acquisition from the web , 2006, Int. J. Knowl. Based Intell. Eng. Syst..

[19]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[20]  Frank Keller,et al.  Using the Web to Overcome Data Sparseness , 2002, EMNLP.

[21]  Emmanuel Morin,et al.  Extracting Semantic Relationships between Terms: Supervised vs. Unsupervised Methods , 1999 .

[22]  Massimo Poesio,et al.  Identifying Concept Attributes Using a Classifier , 2005, ACL 2005.

[23]  Sabine Schulte im Walde Clustering Verbs Semantically According to their Alternation Behaviour , 2000, COLING.

[24]  M. Sabou,et al.  Building web service ontologies , 2006 .

[25]  김두식,et al.  English Verb Classes and Alternations , 2006 .

[26]  Paul Buitelaar,et al.  Evaluating Context Features for Medical Relation Mining , 2003 .

[27]  Jan O. Pedersen,et al.  Phrase recognition and expansion for short, precision-biased queries based on a query log , 1999, SIGIR '99.

[28]  Steffen Staab,et al.  Discovering Conceptual Relations from Text , 2000, ECAI.

[29]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[30]  Steffen Staab,et al.  Learning by googling , 2004, SKDD.

[31]  Paola Velardi,et al.  Integrated approach to Web ontology learning and engineering , 2002, Computer.

[32]  Gregory Grefenstette,et al.  Finding Semantic Similarity in Raw Text: the Deese Antonyms , 1992 .

[33]  Jimmy J. Lin,et al.  Data-Intensive Question Answering , 2001, TREC.

[34]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[35]  Paul Buitelaar,et al.  RelExt: A Tool for Relation Extraction from Text in Ontology Extension , 2005, SEMWEB.

[36]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[37]  James Pustejovsky,et al.  Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations , 2001, Pacific Symposium on Biocomputing.

[38]  Suresh Manandhar,et al.  An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery , 2004 .

[39]  Paul M. B. Vitányi,et al.  Automatic Meaning Discovery Using Google , 2006, Kolmogorov Complexity and Applications.

[40]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[41]  William Lewis Measuring Conceptual Distance Using WordNet: The Design of a Metric for Measuring Semantic Similarity* , 2001 .

[42]  Timothy Baldwin,et al.  Automatic Discovery of Telic and Agentive Roles from Corpus Data , 2004, PACLIC.

[43]  Yorick Wilks,et al.  User-Centred Ontology Learning for Knowledge Management , 2002, NLDB.

[44]  Paola Velardi,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004, CL.

[45]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[46]  David Sánchez,et al.  Automatic Generation of Taxonomies from the WWW , 2004, PAKM.

[47]  Marta Sabou,et al.  Extracting ontologies from software documentation: a semi-automatic method and its evaluation , 2004 .

[48]  Alexiei Dingli,et al.  Integrating Information to Bootstrap Information Extraction from Web Sites , 2003, IIWeb.

[49]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[50]  Bernard J. Jansen,et al.  The effect of query complexity on Web searching results , 2000, Inf. Res..