Anomaly Detection in Dynamic Social Systems Using Weak Estimators

Anomaly detection involves identifying observationsthat deviate from the normal behavior of a system. One ofthe ways to achieve this is by identifying the phenomena thatcharacterize “normal” observations. Subsequently, based on thecharacteristics of data learned from the “normal” observations,new observations are classified as being either “normal” or not.Most state-of-the-art approaches, especially those which belongto the family parameterized statistical schemes, work under theassumption that the underlying distributions of the observationsare stationary. That is, they assume that the distributions thatare learned during the training (or learning) phase, thoughunknown, are not time-varying. They further assume that thesame distributions are relevant even as new observations areencountered. Although such a “stationarity” assumption is relevantfor many applications, there are some anomaly detectionproblems where stationarity cannot be assumed. For example, innetwork monitoring, the patterns which are learned to representnormal behavior may change over time due to several factorssuch as network infrastructure expansion, new services, growthof user population, etc. Similarly, in meteorology, identifyinganomalous temperature patterns involves taking into accountseasonal changes of normal observations. Detecting anomaliesor outliers under these circumstances introduces several challenges.Indeed, the ability to adapt to changes in non-stationaryenvironments is necessary so that anomalous observations canbe identified even with changes in what would otherwise beclassified as “normal” behavior. In this paper, we proposed toapply a family of weak estimators for anomaly detection indynamic environments. In particular, we apply this theory tospam email detection. Our experimental results demonstrate thatour proposal is both feasible and effective for the detection ofsuch anomalous emails.

[1]  Walmir M. Caminhas,et al.  A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..

[2]  B.J. Oommen,et al.  Stochastic Automata-Based Estimators for Adaptively Compressing Files With Nonstationary Distributions , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[4]  Gareth J. F. Jones,et al.  Using online linear classifiers to filter spam emails , 2006, Pattern Analysis and Applications.

[5]  Marko Grobelnik,et al.  Feature selection using linear classifier weights: interaction with classification models , 2004, SIGIR '04.

[6]  B. John Oommen,et al.  Stochastic learning-based weak estimation of multinomial random variables and its applications to pattern recognition in non-stationary environments , 2006, Pattern Recognit..

[7]  P. Oscar Boykin,et al.  Collaborative Spam Filtering Using E-Mail Networks , 2006, Computer.

[8]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[9]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[10]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[11]  Tianshun Yao,et al.  An evaluation of statistical spam filtering techniques , 2004, TALIP.

[12]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[13]  Colin McGregor Controlling spam with SpamAssassin , 2007 .

[14]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[15]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[16]  Luis Rueda,et al.  Toward New Paradigms to Combating Internet Child Pornography , 2006, 2006 Canadian Conference on Electrical and Computer Engineering.

[17]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[18]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[19]  B. John Oommen,et al.  A Fault-Tolerant Routing Algorithm for Mobile Ad Hoc Networks Using a Stochastic Learning-Based Weak Estimation Procedure , 2006, 2006 IEEE International Conference on Wireless and Mobile Computing, Networking and Communications.