A methodology for knowledge acquisition from the web

Accessing up-to-date information in a fast and easy way implies the necessity of information management tools to explore and analyse the huge number of available electronic resources. The Web offers a large amount of valuable information for every possible domain, but its human-oriented representation and its size makes difficult and extremely time consuming any kind of centralised computer-based processing. In this paper, a combination of distributed AI and knowledge acquisition techniques is proposed to tackle this problem. In particular, we have designed an incremental and domain independent learning methodology modelled over a multi-agent system that crawls the Web composing knowledge structures (ontologies) from the interrelation of several automatically obtained taxonomies of terms according to the user’s interests. Moreover, the obtained ontologies are used to represent, in a structured way, the currently available web resources for the corresponding domain. The paper also presents examples of the potential results over medical and technological domains and compares the results, whenever it is possible, against publicly available taxonomic web search engines obtaining, in all cases, a considerable improvement.

[1]  Alexiei Dingli,et al.  Integrating Information to Bootstrap Information Extraction from Web Sites , 2003, IIWeb.

[2]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[3]  Piek Vossen,et al.  Extending, trimming and fusing WordNet for technical documents , 2001 .

[4]  Dan I. Moldovan,et al.  An Interactive Tool for the Rapid Development of Knowledge Bases , 2001, Int. J. Artif. Intell. Tools.

[5]  Christopher S. G. Khoo,et al.  Ontology Learning for Medical Digital Libraries , 2003, ICADL.

[6]  David W. Embley,et al.  Peppering knowledge sources with SALT: Boosting conceptual content for ontology generation , 2002 .

[7]  Craig A. Knoblock,et al.  Hierarchical Wrapper Induction for Semistructured Information Sources , 2004, Autonomous Agents and Multi-Agent Systems.

[8]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[9]  David Sánchez,et al.  Automatic discovery of synonyms and lexicalizations from the Web , 2005, CCIA.

[10]  Nigel Shadbolt,et al.  Agent-based semantic web services , 2003, WWW '03.

[11]  David Sánchez,et al.  Knowledge Exploitation from the Web , 2004, PAKM.

[12]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[13]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[14]  Ralf Steinmetz,et al.  Ontology enrichment with texts from the WWW , 2002 .

[15]  Paola Velardi,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004, CL.

[16]  Nicholas Kushmerick,et al.  Wrapper induction: Efficiency and expressiveness , 2000, Artif. Intell..

[17]  Barbara B. Levin,et al.  English verb classes and alternations , 1993 .

[18]  Marius Pasca,et al.  Finding Instance Names and Alternative Glosses on the Web: WordNet Reloaded , 2005, CICLing.

[19]  Raphael Volz,et al.  Semi-automatic Ontology Acquisition from a Corporate Intranet , 2000 .

[20]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[21]  David Sánchez,et al.  Development of new techniques to improve Web search , 2005, IJCAI.

[22]  Petia Radeva,et al.  Artificial Intelligence Research and Development , 2005 .

[23]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[24]  Oren Etzioni,et al.  A scalable comparison-shopping agent for the World-Wide Web , 1997, AGENTS '97.

[25]  Suresh Manandhar,et al.  An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery , 2004 .

[26]  김두식,et al.  English Verb Classes and Alternations , 2006 .

[27]  Jimmy J. Lin,et al.  Data-Intensive Question Answering , 2001, TREC.

[28]  Gregory Grefenstette Short Query Linguistic Expansion Techniques: Palliating One-Word Queries by Providing Intermediate Structure to Text , 1997, SCIE.

[29]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[30]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[31]  Barbara Messing,et al.  An Introduction to MultiAgent Systems , 2002, Künstliche Intell..

[32]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[33]  Steffen Staab,et al.  Learning by googling , 2004, SKDD.

[34]  Vojtech Svátek,et al.  Discovery of Lexical Entries for Non-taxonomic Relations in Ontology Learning , 2004, SOFSEM.

[35]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[36]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[37]  Feng Luo,et al.  Ontology construction for information selection , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..

[38]  Achim G. Hoffmann,et al.  A New Approach for Concept-Based Web Search , 2003, Australian Conference on Artificial Intelligence.

[39]  David Faure,et al.  First experiences of using semantic knowledge learned by ASIUM for information extraction task using INTEX , 2000, ECAI Workshop on Ontology Learning.

[40]  Dieter Fensel,et al.  Ontologies: A silver bullet for knowledge management and electronic commerce , 2002 .

[41]  Stefan Schulz,et al.  Towards Very Large Terminological Knowledge Bases: A Case Study from Medicine , 2000, Canadian Conference on AI.

[42]  Olatz Ansa,et al.  Enriching very large ontologies using the WWW , 2000, ECAI Workshop on Ontology Learning.