暂无分享,去创建一个
[1] Nilesh N. Dalvi,et al. Robust web extraction: an approach based on a probabilistic tree-edit model , 2009, SIGMOD Conference.
[2] Robert L. Grossman,et al. Mining data records in Web pages , 2003, KDD '03.
[3] Mehmet A. Orgun,et al. Separating XHTML content from navigation clutter using DOM-structure block analysis , 2005, HYPERTEXT '05.
[4] Thomas Gottron,et al. Readability and the Web , 2012, Future Internet.
[5] Sunita Sarawagi,et al. Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..
[6] Thomas Gottron. Combining content extraction heuristics: the CombinE system , 2008, iiWAS.
[7] Andrew Tomkins,et al. The volume and evolution of web page templates , 2005, WWW '05.
[8] Valter Crescenzi,et al. RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.
[9] Sandip Debnath,et al. Automatic extraction of informative blocks from webpages , 2005, SAC '05.
[10] Jiawei Han,et al. CETR: content extraction via tag ratios , 2010, WWW '10.
[11] Daisy Zhe Wang,et al. WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..
[12] Jayant Madhavan,et al. Structured Data on the Web , 2009, 2010 12th International Asia-Pacific Web Conference.
[13] Nicholas Kushmerick,et al. Learning to remove Internet advertisements , 1999, AGENTS '99.
[14] Adam Kilgarriff,et al. Cleaneval: a Competition for Cleaning Web Pages , 2008, LREC.
[15] Thomas Gottron,et al. Content Code Blurring: A New Approach to Content Extraction , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.
[16] Rahul Gupta,et al. Answering Table Augmentation Queries from Unstructured Lists on the Web , 2009, Proc. VLDB Endow..
[17] I. V. Ramakrishnan,et al. Computational aspects of resilient data extraction from semistructured sources (extended abstract) , 2000, PODS '00.
[18] Barry Smyth,et al. Fact or Fiction: Content Classification for Digital Libraries , 2001, DELOS.
[19] Ziv Bar-Yossef,et al. Template detection via data mining and its applications , 2002, WWW.
[20] Bing Liu,et al. Web data extraction based on partial tree alignment , 2005, WWW '05.
[21] Boris Chidlovskii,et al. Documentum ECI self-repairing wrappers: performance analysis , 2006, SIGMOD Conference.
[22] Wei Li,et al. QuASM: a system for question answering using semi-structured data , 2002, JCDL '02.
[23] Peter Fankhauser,et al. Boilerplate detection using shallow text features , 2010, WSDM '10.
[24] Valter Crescenzi,et al. WRAPPER INFERENCE FOR AMBIGUOUS WEB PAGES , 2008, Appl. Artif. Intell..
[25] Calton Pu,et al. Wrapping web data into XML , 2001, SGMD.
[26] Sharma Chakravarthy,et al. Automating Change Detection and Notification of Web Pages (Invited Paper) , 2006, 17th International Workshop on Database and Expert Systems Applications (DEXA'06).
[27] Sunita Sarawagi,et al. Answering Table Queries on the Web using Column Keywords , 2012, Proc. VLDB Endow..
[28] Aditya G. Parameswaran,et al. Optimal schemes for robust web extraction , 2011, Proc. VLDB Endow..