Cluster-based page segmentation-a fast and precise method for web page pre-processing
暂无分享,去创建一个
[1] Alberto H. F. Laender,et al. Automatic web news extraction using tree edit distance , 2004, WWW '04.
[2] Juliana Freire,et al. On Finding Templates on Web Collections , 2009, World Wide Web.
[3] not Cwi,et al. XHTML™ 1.0 The Extensible HyperText Markup Language , 2002 .
[4] Wei-Ying Ma,et al. VIPS: a Vision-based Page Segmentation Algorithm , 2003 .
[5] Wei-Ying Ma,et al. Improving pseudo-relevance feedback in web information retrieval using web page segmentation , 2003, WWW '03.
[6] Andrew Tomkins,et al. The volume and evolution of web page templates , 2005, WWW '05.
[7] Robert L. Grossman,et al. Mining data records in Web pages , 2003, KDD '03.
[8] Vangelis Karkaletsis,et al. Segmenting HTML pages using visual and semantic information , 2008 .
[9] Juliana Freire,et al. A fast and robust method for web page template detection and removal , 2006, CIKM '06.
[10] Xiaoli Li,et al. Eliminating noisy information in Web pages for data mining , 2003, KDD '03.
[11] Jer Lang Hong,et al. Information extraction for search engines using fast heuristic techniques , 2010, Data Knowl. Eng..
[12] Thomas Gottron. Bridging the gap: from multi document Template Detection to single document Content Extraction , 2008, EuroIMSA 2008.
[13] Radek Burget. Layout Based Information Extraction from HTML Documents , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).
[14] Wei Liu,et al. ViDE: A Vision-Based Approach for Deep Web Data Extraction , 2010, IEEE Transactions on Knowledge and Data Engineering.
[15] Eduardo Sany Laber,et al. A fast and simple method for extracting relevant content from news webpages , 2009, CIKM.
[16] Miroslav Spousta,et al. Victor : the Web-Page Cleaning Tool , 2008 .
[17] Gabriel Valiente,et al. An Efficient Bottom-Up Distance between Trees , 2001, SPIRE.