URL-Based Web Page Classification: With n-Gram Language Models
暂无分享,去创建一个
[1] Robert Wing Pong Luk,et al. A Generative Theory of Relevance , 2008, The Information Retrieval Series.
[2] Stephen E. Robertson,et al. A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..
[3] Mark Craven,et al. Combining Statistical and Relational Methods for Learning in Hypertext Domains , 1998, ILP.
[4] Franco Salvetti,et al. Efficient spam analysis for weblogs through URL segmentation , 2007 .
[5] Beatriz de la Iglesia,et al. URL-based Web Page Classification - A New Method for URL-based Web Page Classification Using n-Gram Language Models , 2014, KDIR.
[6] Sofia Stamou,et al. Keyword Identification within Greek URLs , 2011, Polytech. Open Libr. Int. Bull. Inf. Technol. Sci..
[7] David Vilar,et al. Dialogue act classification using a Bayesian approach ∗ , 2004 .
[8] Masaru Kitsuregawa,et al. Topic Classification of Spam Host based on URLs , 2010 .
[9] Monika Henzinger,et al. Web page language identification based on URLs , 2008, Proc. VLDB Endow..
[10] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.
[11] Monika Henzinger,et al. A Comprehensive Study of Techniques for URL-Based Web Page Language Classification , 2013, TWEB.
[12] Min-Yen Kan,et al. Fast webpage classification using URL features , 2005, CIKM '05.
[13] Dawn Xiaodong Song,et al. Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.
[14] Monika Henzinger,et al. Purely URL-based topic classification , 2009, WWW '09.
[15] John D. Lafferty,et al. A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.
[16] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.
[17] W. B. Cavnar,et al. N-gram-based text categorization , 1994 .
[18] Stephen E. Robertson,et al. Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..
[19] Min-Yen Kan. Web page classification without the web page , 2004, WWW Alt. '04.
[20] Steven C. H. Hoi,et al. Cost-sensitive online active learning with application to malicious URL detection , 2013, KDD.
[21] Lawrence K. Saul,et al. Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.
[22] I. Good. THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .
[23] Monika Henzinger,et al. A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification , 2011, TWEB.
[24] William S. Cooper,et al. Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval , 1995, TOIS.
[25] Dale Schuurmans,et al. Text Classification in Asian Languages without Word Segmentation , 2003, IRAL.
[26] Ian H. Witten,et al. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.
[27] Egidio L. Terra. Simple Language Models for Spam Detection , 2005, TREC.