Identification of Duplicate News Stories in Web Pages
暂无分享,去创建一个
[1] Hassan Alam,et al. Understanding the Flow of Content in Summarizing HTML Documents , 2001 .
[2] Geoffrey Zweig,et al. Syntactic Clustering of the Web , 1997, Comput. Networks.
[3] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.
[4] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.
[5] Nicholas Kushmerick,et al. Wrapper induction: Efficiency and expressiveness , 2000, Artif. Intell..
[6] Gail E. Kaiser,et al. Automating Content Extraction of HTML Documents , 2005, World Wide Web.
[7] Lynette Hirschman,et al. A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.
[8] Berthier A. Ribeiro-Neto,et al. A brief survey of web data extraction tools , 2002, SGMD.
[9] J. Hanley,et al. The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.
[10] Lluís Màrquez i Villodre,et al. Semantic Role Labeling as Sequential Tagging , 2005, CoNLL.
[11] Gregory Grefenstette,et al. Web as Corpus , 2003 .
[12] S da SilvaAltigran,et al. A brief survey of web data extraction tools , 2002 .
[13] Ben Wellner,et al. Adaptive web-page content identification , 2007, WIDM '07.
[14] Alvaro E. Monge. Matching Algorithms within a Duplicate Detection System , 2000, IEEE Data Engineering Bulletin.
[15] Marc Najork,et al. On the evolution of clusters of near-duplicate Web pages , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).
[16] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.
[17] Ben Wellner,et al. Leveraging Machine Readable Dictionaries in Discriminative Sequence Models , 2006, LREC.
[18] Jason Baldridge,et al. A Sequencing Model for Situation Entity Classification , 2007, ACL.
[19] Joshua Alspector,et al. Improved robustness of signature-based near-replica detection via lexicon randomization , 2004, KDD.
[20] Joongmin Choi,et al. MetaNews: An Information Agent for Gathering News Articles on the Web , 2003, ISMIS.
[21] Ophir Frieder,et al. Collection statistics for fast duplicate document detection , 2002, TOIS.
[22] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[23] Breck Baldwin,et al. Algorithms for Scoring Coreference Chains , 1998 .
[24] Craig A. Knoblock,et al. A hierarchical approach to wrapper induction , 1999, AGENTS '99.
[25] W. Bruce Croft,et al. Table extraction using conditional random fields , 2003, DG.O.
[26] Judith L. Klavans,et al. Columbia Newsblaster: Multilingual News Summarization on the Web , 2004, NAACL.