STEM: a suffix tree-based method for web data records extraction
暂无分享,去创建一个
Reynold Cheng | Zhiqiang Zhang | Yixiang Fang | Xiaofeng Zhang | Xiaoqin Xie | Reynold Cheng | Xiaoqin Xie | Zhiqiang Zhang | Xiaofeng Zhang | Yixiang Fang
[1] Gail E. Kaiser,et al. Automating Content Extraction of HTML Documents , 2005, World Wide Web.
[2] Georg Gottlob,et al. Scalable Web Data Extraction for Online Market Intelligence , 2009, Proc. VLDB Endow..
[3] Wei Liu,et al. ViDE: A Vision-Based Approach for Deep Web Data Extraction , 2010, IEEE Transactions on Knowledge and Data Engineering.
[4] M. Farach. Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.
[5] Lidong Bing,et al. Towards a unified solution: data record region detection and segmentation , 2011, CIKM '11.
[6] Rajeev Rastogi,et al. Web-scale information extraction with vertex , 2011, 2011 IEEE 27th International Conference on Data Engineering.
[7] Gerhard Weikum,et al. Combining information extraction and human computing for crowdsourced knowledge acquisition , 2014, 2014 IEEE 30th International Conference on Data Engineering.
[8] Lidong Bing,et al. Robust detection of semi-structured web records using a DOM structure-knowledge-driven model , 2013, TWEB.
[9] Bing Liu,et al. NET - A System for Extracting Web Data from Flat and Nested Data Records , 2005, WISE.
[10] Tim Furche,et al. DIADEM: Thousands of Websites to a Single Database , 2014, Proc. VLDB Endow..
[11] Umeshwar Dayal,et al. PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.
[12] Hector Garcia-Molina,et al. Extracting structured data from Web pages , 2003, SIGMOD '03.
[13] Yunming Ye,et al. Detecting hot topics from Twitter: A multiview approach , 2014, J. Inf. Sci..
[14] Sachio Hirokawa,et al. Testbed for information extraction from deep web , 2004, WWW Alt. '04.
[15] Clement T. Yu,et al. Automatic extraction of dynamic record sections from search engine result pages , 2006, VLDB.
[16] Rafael Corchuelo,et al. A Survey on Region Extractors from Web Documents , 2013, IEEE Transactions on Knowledge and Data Engineering.
[17] M. Crochemore,et al. On-line construction of suffix trees , 2002 .
[18] Tim Weninger,et al. Text Extraction from the Web via Text-to-Tag Ratio , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.
[19] Calton Pu,et al. XWRAP: an XML-enabled wrapper construction system for Web information sources , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).
[20] Chia-Hui Chang,et al. IEPAD: information extraction based on pattern discovery , 2001, WWW '01.
[21] Jiawei Han,et al. CETR: content extraction via tag ratios , 2010, WWW '10.
[22] Berthier A. Ribeiro-Neto,et al. A brief survey of web data extraction tools , 2002, SGMD.
[23] Mohammed Kayed. Peer Matrix Alignment: A New Algorithm , 2012, PAKDD.
[24] Dan Roth,et al. Extracting article text from the web with maximum subsequence segmentation , 2009, WWW '09.
[25] Donald E. Knuth,et al. Fast Pattern Matching in Strings , 1977, SIAM J. Comput..
[26] Lejian Liao,et al. DOM based content extraction via text density , 2011, SIGIR.
[27] Lejian Liao,et al. A hybrid approach for content extraction with text density and visual importance of DOM nodes , 2013, Knowledge and Information Systems.
[28] Ji-Rong Wen,et al. Efficient record-level wrapper induction , 2009, CIKM.
[29] Pasquale De Meo,et al. Web Data Extraction , Applications and Techniques : A Survey , 2010 .
[30] Woong-Kee Loh,et al. A Storage-Efficient Suffix Tree Construction Algorithm for Human Genome Sequences , 2011, IEICE Trans. Inf. Syst..
[31] Qiming Chen,et al. PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.
[32] Doug Downey,et al. Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.
[33] Wolfgang Gatterbauer,et al. Towards domain-independent information extraction from web tables , 2007, WWW '07.
[34] Shanchan Wu,et al. Automatic Web Content Extraction by Combination of Learning and Grouping , 2015, WWW.
[35] Khaled Shaalan,et al. A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.
[36] Gonzalo Navarro,et al. A guided tour to approximate string matching , 2001, CSUR.
[37] Vijay V. Raghavan,et al. Fully automatic wrapper generation for search engines , 2005, WWW '05.
[38] Louise E. Moser,et al. Extracting data records from the web using tag path clustering , 2009, WWW '09.
[39] Bing Liu,et al. Web data extraction based on partial tree alignment , 2005, WWW '05.
[40] Donato Malerba,et al. HyLiEn: a hybrid approach to general list extraction on the web , 2011, WWW.
[41] Robert L. Grossman,et al. Mining data records in Web pages , 2003, KDD '03.
[42] Georg Lausen,et al. ViPER: augmenting automatic information extraction with visual perceptions , 2005, CIKM '05.
[43] Bing Liu,et al. Structured Data Extraction from the Web Based on Partial Tree Alignment , 2006, IEEE Transactions on Knowledge and Data Engineering.
[44] Xiaotie Deng,et al. A new suffix tree similarity measure for document clustering , 2007, WWW '07.
[45] Valter Crescenzi,et al. ALFRED: crowd assisted data extraction , 2013, WWW '13 Companion.
[46] Arbee L. P. Chen,et al. Efficient frequent sequence mining by a dynamic strategy switching algorithm , 2008, The VLDB Journal.
[47] Li Li,et al. Extracting data records from web using suffix tree , 2012, MDS '12.
[48] Gail E. Kaiser,et al. DOM-based content extraction of HTML documents , 2003, WWW '03.
[49] Calton Pu,et al. A fully automated object extraction system for the World Wide Web , 2001, Proceedings 21st International Conference on Distributed Computing Systems.
[50] Oren Etzioni,et al. Web document clustering: a feasibility demonstration , 1998, SIGIR '98.
[51] Jianyong Wang,et al. Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.
[52] Ravi Kumar,et al. Automatic Wrappers for Large Scale Web Extraction , 2011, Proc. VLDB Endow..
[53] Ronald I. Greenberg. Bounds on the Number of Longest Common Subsequences , 2003, ArXiv.
[54] Wei-Ying Ma,et al. Extracting Content Structure for Web Pages Based on Visual Representation , 2003, APWeb.
[55] Roberto Grossi,et al. Suffix trees and their applications in string algorithms , 1993 .
[56] Yi Liu,et al. Combining Tag and Value Similarity for Data Extraction and Alignment , 2012, IEEE Transactions on Knowledge and Data Engineering.