A Data Cleaning Method for CiteSeer Dataset
暂无分享,去创建一个
Jianguo Lu | Tong Zhou | Hao Zhang | Yan Wang | Yaxin Li | Deyun Wang | YanLin Ma | H. Zhang | Jianguo Lu | Yan Wang | Yanlin Ma | Yaxin Li | Tong Zhou | Deyun Wang
[1] Craig A. Knoblock,et al. Learning object identification rules for information integration , 2001, Inf. Syst..
[2] William W. Cohen,et al. A Comparison of String Metrics for Matching Names and Records , 2003 .
[3] Lise Getoor,et al. Collective Classification in Network Data , 2008, AI Mag..
[4] Cornelia Caragea,et al. Classifying Scientific Publications Using Abstract Features , 2011, SARA.
[5] Madian Khabsa,et al. The impact of user corrections on a crawl-based digital library: A CiteSeerX perspective , 2014, 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.
[6] Wenyi Huang,et al. Recommending citations: translating papers into references , 2012, CIKM.
[7] Ivan P. Fellegi,et al. A Theory for Record Linkage , 1969 .
[8] Cornelia Caragea,et al. Automatic Identification of Research Articles from Crawled Documents , 2014, WSDM 2014.
[9] Chen Li,et al. Efficient record linkage in large data sets , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..
[10] Peter Christen,et al. A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication , 2012, IEEE Transactions on Knowledge and Data Engineering.
[11] Cornelia Caragea,et al. CiteSeerX: AI in a Digital Library Search Engine , 2014, AI Mag..
[12] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[13] Soumen Chakrabarti,et al. Mining the web - discovering knowledge from hypertext data , 2002 .
[14] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .
[15] Erhard Rahm,et al. Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..
[16] William W. Cohen,et al. Learning to match and cluster large high-dimensional data sets for data integration , 2002, KDD.
[17] Dale Schuurmans,et al. Combining Naive Bayes and n-Gram Language Models for Text Classification , 2003, ECIR.
[18] Jöran Beel,et al. Evaluation of header metadata extraction approaches and tools for scientific PDF documents , 2013, JCDL '13.
[19] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.
[20] Jianguo Lu,et al. TS-IDS Algorithm for Query Selection in the Deep Web Crawling , 2014, APWeb.
[21] Cornelia Caragea,et al. CiteSeer x : A Scholarly Big Dataset , 2014, ECIR.
[22] Madian Khabsa,et al. Digital commons , 2020, Internet Policy Rev..
[23] Cornelia Caragea,et al. Can't see the forest for the trees?: a citation recommendation system , 2013, JCDL '13.