qRead: A fast and accurate article extraction method from web pages using partition features optimizations
暂无分享,去创建一个
[1] Ben Wellner,et al. Adaptive web-page content identification , 2007, WIDM '07.
[2] Rostislav Khlebnikov,et al. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2016 .
[3] Andrew Tomkins,et al. The volume and evolution of web page templates , 2005, WWW '05.
[4] Ming-Syan Chen,et al. Mining Web informative structures and contents based on entropy analysis , 2004, IEEE Transactions on Knowledge and Data Engineering.
[5] Valter Crescenzi,et al. RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.
[6] Yan Guo,et al. ECON: An Approach to Extract Content from Web News Page , 2010, 2010 12th International Asia-Pacific Web Conference.
[7] Hayri Volkan Agun,et al. A hybrid approach for extracting informative content from web pages , 2013, Inf. Process. Manag..
[8] Ming-Syan Chen,et al. Entropy-based link analysis for mining web informative structures , 2002, CIKM '02.
[9] Sam Liu,et al. Web document text and images extraction using DOM analysis and natural language processing , 2009, DocEng '09.
[10] Shumeet Baluja,et al. Browsing on small screens: recasting web-page segmentation into an efficient machine learning framework , 2006, WWW '06.
[11] Brad Adelberg,et al. NoDoSE - A Tool for Semi-Automatically Extracting Semi-Structured Data from Text Documents , 1998, SIGMOD Conference.
[12] Deepayan Chakrabarti,et al. Page-level template detection via isotonic smoothing , 2007, WWW '07.
[13] Ziv Bar-Yossef,et al. Template detection via data mining and its applications , 2002, WWW.
[14] Calton Pu,et al. XWRAP: an XML-enabled wrapper construction system for Web information sources , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).
[15] Tim Weninger,et al. Text Extraction from the Web via Text-to-Tag Ratio , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.
[16] Dan Roth,et al. Extracting article text from the web with maximum subsequence segmentation , 2009, WWW '09.
[17] Andreas Paepcke,et al. Coreex: content extraction from online news articles , 2008, CIKM '08.
[18] Sandip Debnath,et al. Automatic identification of informative sections of Web pages , 2005, IEEE Transactions on Knowledge and Data Engineering.
[19] Michal Skubacz,et al. Content Extraction from News Pages Using Particle Swarm Optimization on Linguistic and Structural Features , 2007 .
[20] Peter Fankhauser,et al. Boilerplate detection using shallow text features , 2010, WSDM '10.
[21] Xiaoli Li,et al. Eliminating noisy information in Web pages for data mining , 2003, KDD '03.
[22] Jie Wang,et al. Handling Big Data of Online Social Networks on a Small Machine , 2014, COCOON.