Weblogs as a source for extracting general world knowledge
暂无分享,去创建一个
Knowledge extraction (KE) efforts have often used corpora of heavily edited writing and sources written to provide the desired knowledge (e.g., newspapers or textbooks). However, the proliferation of diverse, up-to-date, unedited writing on the Web, especially in weblogs, offers new challenges for KE tools. We describe our efforts to extract general knowledge implicit in this noisy data and examine whether such sources can be an adequate substitute for resources like Wikipedia.
[1] Akshay Java,et al. The ICWSM 2009 Spinn3r Dataset , 2009 .
[2] Lenhart K. Schubert. Can we derive general world knowledge from texts , 2002 .
[3] Lenhart K. Schubert,et al. Extracting and evaluating general world knowledge from the Brown Corpus , 2003, HLT-NAACL 2003.
[4] Oren Etzioni,et al. Open Information Extraction from the Web , 2007, CACM.
[5] Lenhart K. Schubert,et al. Open Knowledge Extraction through Compositional Language Processing , 2008, STEP.