Wikipedia, a killer application in Web 2.0, has embraced the power of collaborative editing to harness collective intelligence. It features many attractive characteristics, like entity-based link graph, abundant categorization and semi-structured layout, and can serve as an ideal data source to extract high quality and well-structured data. In this chapter, we first propose several solutions to extract knowledge from Wikipedia. We do not only consider information from the relational summaries of articles (infoboxes) but also semi-automatically extract it from the article text using the structured content available. Due to differences with information extraction from the Web, it is necessary to tackle new problems, like the lack of redundancy in Wikipedia that is dealt with by extending traditional machine learning algorithms to work with few labeled data. Furthermore, we also exploit the widespread categories as a complementary way to discover additional knowledge. Benefiting from both structured and textural information, we additionally provide a suggestion service for Wikipedia authoring. With the aim to facilitate semantic reuse, our proposal provides users with facilities such as link, categories and infobox content suggestions. The proposed enhancements can be applied to attract more contributors and lighten the burden of professional editors. Finally, we developed an enhanced search system, which can ease the process of exploiting Wikipedia. To provide a user-friendly interface, it extends the faceted search interface with relation navigation and let the user easily express his complex information needs in an interactive way. In order to achieve efficient query answering, it extends scalable IR engines to index and search both the textual and structured information with an integrated ranking support.
[1]
Kevin Li,et al.
Faceted metadata for image search and browsing
,
2003,
CHI '03.
[2]
V. Zlatic,et al.
Wikipedias: collaborative web-based encyclopedias as complex networks.
,
2006,
Physical review. E, Statistical, nonlinear, and soft matter physics.
[3]
Martin Halvey,et al.
WWW '07: Proceedings of the 16th international conference on World Wide Web
,
2007,
WWW 2007.
[4]
Haofen Wang,et al.
Making More Wikipedians: Facilitating Semantics Reuse for Wikipedia Authoring
,
2007,
ISWC/ASWC.
[5]
J. Giles.
Internet encyclopaedias go head to head
,
2005,
Nature.
[6]
Jie Zhang,et al.
Semplore: An IR Approach to Scalable Hybrid Query of Semantic Web Data
,
2007,
ISWC/ASWC.
[7]
Xiaoli Li,et al.
Learning to Classify Texts Using Positive and Unlabeled Data
,
2003,
IJCAI.
[8]
Gang Wang,et al.
PORE: Positive-Only Relation Extraction from Wikipedia Text
,
2007,
ISWC/ASWC.
[9]
Haofen Wang,et al.
Catriple: Extracting Triples from Wikipedia Categories
,
2008,
ASWC.
[10]
Gerhard Weikum,et al.
WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge
,
2022
.
[11]
JUSTIN ZOBEL,et al.
Inverted files for text search engines
,
2006,
CSUR.