Web-scale taxonomy learning

In this paper, we propose an automatic and unsupervised methodology to obtain taxonomies of terms from the Web and represent retrieved web sites into a meaningful organization for a desired domain without previous knowledge. It is based on the intensive use of web search engines to retrieve domain suitable resources from which extract knowledge, and to obtain web scale statistics from which infer knowledge relevancy. Results can be useful for easing the access to the web resources or as the first step for constructing ontologies suitable for the Semantic Web.

[1]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[2]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[3]  Gregory Grefenstette SQLET : Short Query Linguistic Expansion Techniques: Palliating One or Two-word Queries by Providing Intermediate Structure to WWW Pages , 1997, RIAO.

[4]  Alexiei Dingli,et al.  Integrating Information to Bootstrap Information Extraction from Web Sites , 2003, IIWeb.

[5]  Marc Ehrig,et al.  Knowledge Extraction from Classification Schemas , 2004, CoopIS/DOA/ODBASE.

[6]  Mark Stevenson,et al.  Using Corpus-derived Name Lists for Named Entity Recognition , 2000, ANLP.

[7]  Dieter Fensel,et al.  Ontologies: A silver bullet for knowledge management and electronic commerce , 2002 .

[8]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[9]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[10]  Achim G. Hoffmann,et al.  A New Approach for Concept-Based Web Search , 2003, Australian Conference on Artificial Intelligence.

[11]  Olatz Ansa,et al.  Enriching very large ontologies using the WWW , 2000, ECAI Workshop on Ontology Learning.

[12]  Steffen Staab,et al.  Learning by googling , 2004, SKDD.

[13]  David Sánchez,et al.  Automatic Generation of Taxonomies from the WWW , 2004, PAKM.

[14]  David Sánchez,et al.  Development of new techniques to improve Web search , 2005, IJCAI.

[15]  Suresh Manandhar,et al.  An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery , 2004 .

[16]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[17]  D. Sánchez,et al.  Creating Ontologies from Web documents , 2004 .

[18]  Piek Vossen,et al.  Extending, trimming and fusing WordNet for technical documents , 2001 .

[19]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[20]  Paola Velardi,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004, CL.