Automatic Web Content Extraction by Combination of Learning and Grouping
暂无分享,去创建一个
Shanchan Wu | Jian Fan | Jerry Liu | Jerry Liu | Jian Fan | Shanchan Wu
[1] Jian Pei,et al. Can we learn a template-independent wrapper for news article extraction from a single training site? , 2009, KDD.
[2] Jiawei Han,et al. CETR: content extraction via tag ratios , 2010, WWW '10.
[3] Shuming Shi,et al. Title extraction from bodies of HTML documents and its application to web page retrieval , 2005, SIGIR '05.
[4] Jan-Ming Ho,et al. Discovering informative content blocks from Web documents , 2002, KDD.
[5] Ziv Bar-Yossef,et al. Template detection via data mining and its applications , 2002, WWW.
[6] Alberto H. F. Laender,et al. Automatic web news extraction using tree edit distance , 2004, WWW '04.
[7] Jian Fan,et al. Automatic selection of print-worthy content for enhanced web page printing experience , 2010, DocEng '10.
[8] Enhong Chen,et al. Harnessing the wisdom of the crowds for accurate web page clipping , 2012, KDD.
[9] Liang Chen,et al. Template detection for large scale search engines , 2006, SAC '06.
[10] Wei-Ying Ma,et al. Extracting Content Structure for Web Pages Based on Visual Representation , 2003, APWeb.
[11] Peter Fankhauser,et al. Boilerplate detection using shallow text features , 2010, WSDM '10.
[12] Ping Luo,et al. Article clipper: a system for web article extraction , 2011, KDD.
[13] Dan Roth,et al. Extracting article text from the web with maximum subsequence segmentation , 2009, WWW '09.
[14] Wei-Ying Ma,et al. Learning block importance models for web pages , 2004, WWW '04.
[15] A. K. Singh,et al. An Efficient Method of Eliminating Noisy Information in Web Pages for Data Mining , 2004, CIT.
[16] Wolfgang Nejdl,et al. A densitometric approach to web page segmentation , 2008, CIKM '08.
[17] Ping Luo,et al. Web article extraction for web printing: a DOM+visual based approach , 2009, DocEng '09.
[18] Ming-Syan Chen,et al. Mining Web informative structures and contents based on entropy analysis , 2004, IEEE Transactions on Knowledge and Data Engineering.
[19] Berthier A. Ribeiro-Neto,et al. Computing block importance for searching on web sites , 2007, CIKM '07.
[20] Patrick Gallinari,et al. Document structure meets page layout: loopy random fields for web news content extraction , 2010, DocEng '10.
[21] Lejian Liao,et al. DOM based content extraction via text density , 2011, SIGIR.