Navigation objects extraction for better content structure understanding
暂无分享,去创建一个
Jiajun Bu | Can Wang | Kui Zhao | Zilun Peng | Bangpeng Li
[1] Patrick Haffner,et al. Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.
[2] Lior Rokach,et al. Data Mining And Knowledge Discovery Handbook , 2005 .
[3] Xiaoli Li,et al. Eliminating noisy information in Web pages for data mining , 2003, KDD '03.
[4] Shanchan Wu,et al. Automatic Web Content Extraction by Combination of Learning and Grouping , 2015, WWW.
[5] William M. Rand,et al. Objective Criteria for the Evaluation of Clustering Methods , 1971 .
[6] Matthias Keller,et al. MenuMiner: revealing the information architecture of large web sites by analyzing maximal cliques , 2012, WWW.
[7] Robert P. W. Duin,et al. Feature Scaling in Support Vector Data Descriptions , 2000 .
[8] Camille Roth,et al. Natural Scales in Geographical Patterns , 2017, Scientific Reports.
[9] Adam Kilgarriff,et al. Cleaneval: a Competition for Cleaning Web Pages , 2008, LREC.
[10] Keishi Tajima,et al. Extracting Logical Hierarchical Structure of HTML Documents Based on Headings , 2015, Proc. VLDB Endow..
[11] Jiawei Han,et al. CETR: content extraction via tag ratios , 2010, WWW '10.
[12] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[13] Natasa Milic-Frayling,et al. Link Structure Graphs for Representing and Analyzing Web Sites , 2006 .
[14] A. F. R. Rahman,et al. Content Extraction from HTML Documents , 2001 .
[15] Ji-Rong Wen,et al. Template-Independent News Extraction Based on Visual Consistency , 2007, AAAI.
[16] C. Lee Giles,et al. Accessibility of information on the web , 1999, Nature.
[17] Dan Roth,et al. Extracting article text from the web with maximum subsequence segmentation , 2009, WWW '09.
[18] Hannes Hartenstein,et al. Search result presentation: supporting post-search navigation by integration of taxonomy data , 2013, WWW '13 Companion.
[19] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[20] Jan-Ming Ho,et al. Discovering informative content blocks from Web documents , 2002, KDD.
[21] Wei-Ying Ma,et al. VIPS: a Vision-based Page Segmentation Algorithm , 2003 .
[22] Christopher C. Yang,et al. Web site topic-hierarchy generation based on link structure , 2009, J. Assoc. Inf. Sci. Technol..
[23] L. Hubert,et al. Comparing partitions , 1985 .
[24] Liang Chen,et al. Template detection for large scale search engines , 2006, SAC '06.
[25] Jiawei Han,et al. Hierarchical Web-Page Clustering via In-Page and Cross-Page Link Structures , 2010, PAKDD.
[26] Andrei Z. Broder,et al. Graph structure in the Web , 2000, Comput. Networks.
[27] Albert-László Barabási,et al. Internet: Diameter of the World-Wide Web , 1999, Nature.
[28] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[29] Lejian Liao,et al. DOM based content extraction via text density , 2011, SIGIR.
[30] Ravi Kumar,et al. Hierarchical topic segmentation of websites , 2006, KDD '06.
[31] James Bailey,et al. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..
[32] Jian Pei,et al. Can we learn a template-independent wrapper for news article extraction from a single training site? , 2009, KDD.
[33] Rui Xu,et al. Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.
[34] Jon Kleinberg,et al. The Structure of the Web , 2001, Science.
[35] Ziv Bar-Yossef,et al. Template detection via data mining and its applications , 2002, WWW.