TEXT: Automatic Template Extraction from Heterogeneous Web Pages
暂无分享,去创建一个
[1] Vijay V. Raghavan,et al. Fully automatic wrapper generation for search engines , 2005, WWW '05.
[2] S. Muthukrishnan,et al. Selectively estimation for Boolean queries , 2000, PODS '00.
[3] Alan M. Frieze,et al. Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..
[4] Valter Crescenzi,et al. Clustering Web pages based on their structure , 2005, Data Knowl. Eng..
[5] Juliana Freire,et al. A fast and robust method for web page template detection and removal , 2006, CIKM '06.
[6] Ruihua Song,et al. Joint optimization of wrapper generation and template detection , 2007, KDD '07.
[7] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[8] Junghoo Cho,et al. RankMass crawler: a crawler with high personalized pagerank coverage guarantee , 2007, VLDB 2007.
[9] Xiang Zhang,et al. CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition , 2008, SIGMOD Conference.
[10] Clement T. Yu,et al. Automatic extraction of dynamic record sections from search engine result pages , 2006, VLDB.
[11] Mark D. Plumbley. Clustering of Sparse Binary Data using a Minimum Description Length Approach , 2002 .
[12] Andrew Tomkins,et al. The volume and evolution of web page templates , 2005, WWW '05.
[13] Valter Crescenzi,et al. RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.
[14] Kyuseok Shim,et al. XTRACT: a system for extracting document type descriptors from XML documents , 2000, SIGMOD '00.
[15] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[16] Inderjit S. Dhillon,et al. Information-theoretic co-clustering , 2003, KDD '03.
[17] Ziv Bar-Yossef,et al. Template detection via data mining and its applications , 2002, WWW.
[18] Tobias Dönz. Extracting Structured Data from Web Pages , 2003 .
[19] Jorma Rissanen,et al. Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.
[20] Bing Liu,et al. Web data extraction based on partial tree alignment , 2005, WWW '05.
[21] Deepayan Chakrabarti,et al. Page-level template detection via isotonic smoothing , 2007, WWW '07.
[22] Kristina Lerman,et al. Using the structure of Web sites for automatic segmentation of tables , 2004, SIGMOD '04.
[23] Alberto H. F. Laender,et al. Automatic web news extraction using tree edit distance , 2004, WWW '04.
[24] Philip S. Yu,et al. Co-clustering by block value decomposition , 2005, KDD '05.