Language independent web news extraction system based on text detection framework
暂无分享,去创建一个
[1] Wei Liu,et al. ViDE: A Vision-Based Approach for Deep Web Data Extraction , 2010, IEEE Transactions on Knowledge and Data Engineering.
[2] Ben Wellner,et al. Adaptive web-page content identification , 2007, WIDM '07.
[3] Thomas Gottron. Combining content extraction heuristics: the CombinE system , 2008, iiWAS.
[4] Yiming Yang,et al. A re-examination of text categorization methods , 1999, SIGIR '99.
[5] Franz Schweiggert,et al. Extracting the Main Content of Web Documents based on a Naive Smoothing Method , 2011, KDIR.
[6] Jian Pei,et al. News article extraction with template-independent wrapper , 2009, WWW '09.
[7] Alberto H. F. Laender,et al. Automatic web news extraction using tree edit distance , 2004, WWW '04.
[8] Barry Smyth,et al. Fact or Fiction: Content Classification for Digital Libraries , 2001, DELOS.
[9] Peter Fankhauser,et al. Boilerplate detection using shallow text features , 2010, WSDM '10.
[10] Daniel S. Hirschberg,et al. A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.
[11] Craig A. Knoblock,et al. Hierarchical Wrapper Induction for Semistructured Information Sources , 2004, Autonomous Agents and Multi-Agent Systems.
[12] Lejian Liao,et al. DOM based content extraction via text density , 2011, SIGIR.
[13] Calton Pu,et al. XWRAP: an XML-enabled wrapper construction system for Web information sources , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).
[14] Lishuang Li,et al. Two-phase biomedical named entity recognition using CRFs , 2009, Comput. Biol. Chem..
[15] Marcos André Gonçalves,et al. Using structural information to improve search in Web collections , 2010 .
[16] A. F. R. Rahman,et al. Content Extraction from HTML Documents , 2001 .
[17] Klaus Berberich,et al. Mind the gap: large-scale frequent sequence mining , 2013, SIGMOD '13.
[18] Nasrullah Memon,et al. Hybrid model of content extraction , 2012, J. Comput. Syst. Sci..
[19] Eduardo Sany Laber,et al. An efficient language-independent method to extract content from news webpages , 2011, DocEng '11.
[20] Ji-Rong Wen,et al. Template-Independent News Extraction Based on Visual Consistency , 2007, AAAI.
[21] Rainer Lienhart,et al. Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..
[22] Wei-Ying Ma,et al. Extracting Content Structure for Web Pages Based on Visual Representation , 2003, APWeb.
[23] Franz Schweiggert,et al. TitleFinder: extracting the headline of news web pages based on cosine similarity and overlap scoring similarity , 2012, WIDM '12.
[24] Brad Adelberg,et al. NoDoSE - A Tool for Semi-Automatically Extracting Semi-Structured Data from Text Documents , 1998, SIGMOD Conference.
[25] Franz Schweiggert,et al. Extracting the Main Content of Web Documents Based on Character Encoding and a Naive Smoothing Method , 2011, ICSOFT.
[26] Pavel Pecina,et al. Web Page Cleaning with Conditional Random Fields , 2007 .
[27] Saleh Alshomrani,et al. Bi-languages Mining Algorithm for Extraction Useful Web Contents (BiLEx) , 2015 .
[28] Dan Roth,et al. Extracting article text from the web with maximum subsequence segmentation , 2009, WWW '09.
[29] Michael R. Lyu,et al. A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.
[30] Thomas G. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.
[31] Jiawei Han,et al. CETR: content extraction via tag ratios , 2010, WWW '10.
[32] Hayri Volkan Agun,et al. An effective and efficient Web content extractor for optimizing the crawling process , 2013, Softw. Pract. Exp..
[33] Eduardo Sany Laber,et al. A fast and simple method for extracting relevant content from news webpages , 2009, CIKM.
[34] Enrique Herrera-Viedma,et al. Sentiment analysis: A review and comparative analysis of web services , 2015, Inf. Sci..
[35] Thomas Gottron,et al. Content Code Blurring: A New Approach to Content Extraction , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.
[36] Xiaowei Wang,et al. News Information Extraction Based on Adaptive Weighting Using Unsupervised Bayesian Algorithm , 2011, WISM.
[37] Jiangfeng Chen,et al. CELB: Content extraction based on line-block , 2011, 2011 6th International Conference on Computer Sciences and Convergence Information Technology (ICCIT).
[38] Stefan Evert. A Lightweight and Efficient Tool for Cleaning Web Pages , 2008, LREC.