The Contribution of Stylistic Information to Content-based Mobile Spam Filtering

Content-based approaches to detecting mobile spam to date have focused mainly on analyzing the topical aspect of a SMS message (what it is about) but not on the stylistic aspect (how it is written). In this paper, as a preliminary step, we investigate the utility of commonly used stylistic features based on shallow linguistic analysis for learning mobile spam filters. Experimental results show that the use of stylistic information is potentially effective for enhancing the performance of the mobile spam filters.

[1]  José María Gómez Hidalgo,et al.  Content based SMS spam filtering , 2006, DocEng '06.

[2]  Gordon V. Cormack,et al.  Spam filtering for short messages , 2007, CIKM '07.

[3]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[4]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[5]  Galit Avneri,et al.  Style-based Text Categorization: What Newspaper Am I Reading? , 1998 .

[6]  David I. Holmes,et al.  Neural network applications in stylometry: The Federalist Papers , 1996, Comput. Humanit..

[7]  A. Q. Morton The Authorship of Greek Prose , 1965 .

[8]  H. van Halteren,et al.  Outside the cave of shadows: using syntactic annotation to enhance authorship attribution , 1996 .

[9]  Shlomo Argamon,et al.  Automatically Categorizing Written Texts by Author Gender , 2002, Lit. Linguistic Comput..

[10]  Gordon V. Cormack,et al.  Feature engineering for mobile (SMS) spam filtering , 2007, SIGIR.

[11]  H. T. Eddy The characteristic curves of composition. , 1887, Science.

[12]  D. Holmes The Evolution of Stylometry in Humanities Scholarship , 1998 .

[13]  G. Yule ON SENTENCE- LENGTH AS A STATISTICAL CHARACTERISTIC OF STYLE IN PROSE: WITH APPLICATION TO TWO CASES OF DISPUTED AUTHORSHIP , 1939 .

[14]  Marina Santini A Shallow Approach To Syntactic Feature Extraction For Genre Classification , 2003 .

[15]  Michael Gamon,et al.  Linguistic correlates of style: authorship classification with deep linguistic analysis features , 2004, COLING.