Automatic Identification of Research Articles from Crawled Documents

Paper from the Web-Scale Classification: Classifying Big Data from the Web Workshop. This paper proposes novel features that result in effective and efficient classification models for automatic identification of research articles.

[1]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[2]  C. Lee Giles,et al.  The evolution of a crawling strategy for an academic document search engine: whitelists and blacklists , 2012, WebSci '12.

[3]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[4]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[5]  C. Lee Giles,et al.  Similar researcher search in academic environments , 2012, JCDL '12.

[6]  Ümit V. Çatalyürek,et al.  Diversified recommendation on graphs: pitfalls, measures, and algorithms , 2013, WWW.

[7]  C. Lee Giles,et al.  Disambiguating authors in academic publications using random forests , 2009, JCDL '09.

[8]  Wei-Ying Ma,et al.  Web-page classification through summarization , 2004, SIGIR '04.

[9]  Cornelia Caragea,et al.  Classifying Scientific Publications Using Abstract Features , 2011, SARA.

[10]  Jian Pei,et al.  Detecting topic evolution in scientific literature: how can citations help? , 2009, CIKM.

[11]  Min-Yen Kan,et al.  Fast webpage classification using URL features , 2005, CIKM '05.

[12]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[13]  Cornelia Caragea,et al.  Researcher homepage classification using unlabeled data , 2013, WWW.

[14]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[15]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[16]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[17]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[18]  Geert-Jan Houben,et al.  Information Retrieval in Distributed Hypertexts , 1994, RIAO.

[19]  Eli Upfal,et al.  Web search using automatic classification , 1996, WWW 1996.

[20]  Yang Song,et al.  CiteSeerχ: a scalable autonomous scientific digital library , 2006, InfoScale '06.

[21]  Xiaolong Zhang,et al.  CollabSeer: a search engine for collaboration discovery , 2011, JCDL '11.