论文信息 - Constructing a Focused Taxonomy from a Document Collection - 字舞流文

Constructing a Focused Taxonomy from a Document Collection

We describe a new method for constructing custom taxonomies from document collections. It involves identifying relevant concepts and entities in text; linking them to knowledge sources like Wikipedia, DBpedia, Freebase, and any supplied taxonomies from related domains; disambiguating conflicting concept mappings; and selecting semantic relations that best group them hierarchically. An RDF model supports interoperability of these steps, and also provides a flexible way of including existing NLP tools and further knowledge sources. From 2000 news articles we construct a custom taxonomy with 10,000 concepts and 12,700 relations, similar in structure to manually created counterparts. Evaluation by 15 human judges shows the precision to be 89% and 90% for concepts and relations respectively; recall was 75% with respect to a manually generated taxonomy for the same domain.

Ian H. Witten | Olena Medelyan | Jeen Broekstra | Anna Divoli | Steve Manion | Anna-Lan Huang | I. Witten | A. Divoli | J. Broekstra | Anna-Lan Huang | Olena Medelyan | S. Manion | A. Huang

[1] Ian H. Witten,et al. An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[2] Mónica Marrero,et al. Evaluation of Named Entity Extraction Systems , 2009 .

[3] Haixun Wang,et al. Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[4] W. Bruce Croft,et al. Deriving concept hierarchies from text , 1999, SIGIR '99.

[5] Christian Bizer,et al. DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[6] Michael J. Witbrock,et al. Searching for Common Sense: Populating Cyc™ from the Web , 2005, AAAI.

[7] Gerhard Weikum,et al. WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[8] Praveen Paritosh,et al. Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[9] Lora Aroyo,et al. The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[10] James A. Hendler,et al. The Semantic Web — ISWC 2002 , 2002, Lecture Notes in Computer Science.

[11] Ian H. Witten,et al. Learning to link with wikipedia , 2008, CIKM '08.

[12] Marti A. Hearst,et al. Automating Creation of Hierarchical Faceted Metadata Structures , 2007, NAACL.

[13] Isabelle Augenstein,et al. LODifier: Generating Linked Data from Unstructured Text , 2012, ESWC.

[14] Simone Paolo Ponzetto,et al. Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[15] Daniel Jurafsky,et al. Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[16] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[17] Sharon A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[18] Jens Lehmann,et al. DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[19] Panagiotis G. Ipeirotis,et al. Automatic Extraction of Useful Facet Hierarchies from Text Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[20] Frank van Harmelen,et al. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.