Victor : the Web-Page Cleaning Tool
暂无分享,去创建一个
[1] Kenneth R. Beesley,et al. Language Identifier: A Computer Program for Automatic Natural-Language Identification of On-line Tex , 1988 .
[2] Tok Wang Ling,et al. IntelliClean: a knowledge-based intelligent data cleaner , 2000, KDD '00.
[3] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[4] Jan-Ming Ho,et al. Discovering informative content blocks from Web documents , 2002, KDD.
[5] Ziv Bar-Yossef,et al. Template detection via data mining and its applications , 2002, WWW.
[6] Bing Liu,et al. Web Page Cleaning for Web Mining through Feature Weighting , 2003, IJCAI.
[7] A. K. Singh,et al. An Efficient Method of Eliminating Noisy Information in Web Pages for Data Mining , 2004, CIT.
[8] Liang Chen,et al. Template detection for large scale search engines , 2006, SAC '06.
[9] Pavel Pecina,et al. Web Page Cleaning with Conditional Random Fields , 2007 .
[10] K. Hofmann,et al. Web Corpus Cleaning using Content and Structure , 2007 .
[11] Martin Schmidt,et al. FIASCO: Filtering the Internet by Automatic Subtree Classification, Osnabr¨ uck , 2007 .
[12] Drahomíra johanka Spoustová. Combining Statistical and Rule-Based Approaches to Morphological Tagging of Czech Texts , 2008, Prague Bull. Math. Linguistics.