Web mining for cyber monitoring and filtering

Like any self-regulating environment, the Internet is fertile ground for all kinds of potential abuse ranging from get-rich-quick scams, touting of illegal or adult-oriented material, promotion of extremist/anarchist views, to online pimping, etc. Consequently, the ability to discreetly intercept and analyze Internet access has tremendous potential in shielding users, especially our youngsters, from inappropriate content. This paper proposes one such system, the Web access monitoring and filtering (WAMF) system. The WAMF system comprises two main decoupled components, one for online monitoring and filtering, and the other for offline Web classification and data analysis. The former tracks, tallies, and selectively blocks user Web access in real-time, whereas the latter employs Web mining techniques to classify Web pages into pre-defined user categories and analyze user Web access data for user behavior patterns. In this paper, we will discuss the WAMF system, and in particular, Web mining techniques for adaptive Web page categorization.

[1]  Yiming Yang,et al.  A Linear Least Squares Fit Mapping Method for Information Retrieval From Natural Language Texts , 1992, COLING.

[2]  S. C. Hui,et al.  CS-Mine: An Efficient WAP-Tree Mining for Web Access Patterns , 2004, APWeb.

[3]  Arthur Flexer,et al.  On the use of self-organizing maps for clustering and visualization , 1999, Intell. Data Anal..

[4]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[5]  Sushil Jajodia,et al.  Discovering calendar-based temporal association rules , 2001, Proceedings Eighth International Symposium on Temporal Representation and Reasoning. TIME 2001.

[6]  HanJiawei,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998 .

[7]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[8]  Gary J. Koehler,et al.  Minimizing Misclassifications in Linear Discriminant Analysis , 1990 .

[9]  Yiming Yang,et al.  An application of least squares fit mapping to text information retrieval , 1993, SIGIR.

[10]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[11]  Arthur Flexer On the use of self-organizing maps for clustering and visualization , 2001 .

[12]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[13]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[14]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[15]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[16]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[17]  S. C. Hui,et al.  Neural Networks for Web Content Filtering , 2002, IEEE Intell. Syst..

[18]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[19]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[20]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Gerald Salton,et al.  Automatic text processing , 1988 .

[22]  I KunchevaLudmila A Theoretical Study on Six Classifier Fusion Strategies , 2002 .

[23]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[24]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[25]  J. Dalton,et al.  Artificial neural networks , 1991, IEEE Potentials.

[26]  John M. Pierre,et al.  On the Automated Classification of Web Sites , 2001, ArXiv.

[27]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.