论文信息 - Discovering Semantic Sibling Groups from Web Documents with XTREEM-SG

Discovering Semantic Sibling Groups from Web Documents with XTREEM-SG

The acquisition of explicit semantics is still a research challenge. Approaches for the extraction of semantics focus mostly on learning hierarchical hypernym-hyponym relations. The extraction of co-hyponym and co-meronym sibling semantics is performed to a much lesser extent, though they are not less important in ontology engineering. In this paper we will describe and evaluate the XTREEM-SG (Xhtml TREE Mining – for Sibling Groups) approach on finding sibling semantics from semi-structured Web documents. XTREEM takes advantage of the added value of mark-up, available in web content, for grouping text siblings. We will show that this grouping is semantically meaningful. The XTREEM-SG approach has the advantage that it is domain and language independent; it does not rely on background knowledge, NLP software or training. In this paper we apply the XTREEM-SG approach and evaluate against the reference semantics from two golden standard ontologies. We investigate how variations on input, parameters and reference influence the obtained results on structuring a closed vocabulary on sibling relations. Earlier methods that evaluate sibling relations against a golden standard report a 14.18% F-measure value. Our method improves this number into 21.47%.

Myra Spiliopoulou | Marko Brunzel | M. Spiliopoulou | M. Brunzel

[1] Myra Spiliopoulou,et al. Discovering Multi Terms and Co-hyponymy from XHTML Documents with XTREEM , 2006, KDXD.

[2] David Buttler,et al. A Short Survey of Document Structure Similarity Algorithms , 2004, International Conference on Internet Computing.

[3] Udo Kruschwitz,et al. Exploiting structure for intelligent Web search , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[4] Yangyong Zhu,et al. Similarity Metric for XML Documents , 2003 .

[5] David Faure,et al. Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM , 1999, EKAW.

[6] Vipul Kashyap,et al. Design and Creation of Ontologies for Environmental Information Retrieval1 , 1999 .

[7] Gerrit Antonides. Evaluation and Applications , 1990 .

[8] Alexander Nareyek,et al. Local Search for Planning and Scheduling , 2001, Lecture Notes in Computer Science.

[9] Philipp Cimiano,et al. Ontology Learning from Text: Methods, Evaluation and Applications , 2005 .

[10] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[11] Steffen Staab,et al. Learning by googling , 2004, SKDD.

[12] Marius Pasca,et al. Finding Instance Names and Alternative Glosses on the Web: WordNet Reloaded , 2005, CICLing.

[13] Sharon A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[14] Wolfgang Lindner,et al. Current Trends in Database Technology - EDBT 2004 Workshops, EDBT 2004 Workshops PhD, DataX, PIM, P2P&DB, and ClustWeb, Heraklion, Crete, Greece, March 14-18, 2004, Revised Selected Papers , 2004, EDBT Workshops.

[15] Richi Nayak,et al. Knowledge Discovery from XML Documents , 2006, Lecture Notes in Computer Science.