Intrusion detection in web applications using text mining

Information security has evolved from just focusing on the network and server layers to also include the web application layer. In fact, security in some types of web applications is often considered a particularly sensitive subject. Achieving a secure web application involves several different issues like encrypting traffic and certain database information, strictly restricting the access control, etc. In this work we focus on detecting attempts of either gaining unauthorised access or misusing a web application. We introduce an intrusion detection software component based on text-mining techniques. By using text categorisation, it is capable of learning the characteristics of both normal and malicious user behaviour from the log entries generated by the web application server. Therefore, the detection of misuse in the web application is achieved without the need of any explicit programming or code writing, hence improving the system maintainability. Because telemedicine systems are usually critical in terms of the confidential information handled and the responsibilities consequently derived, we apply and evaluate our methods on a real web-based telemedicine system called Arnasa.

[1]  Sushil Jajodia,et al.  ADAM: a testbed for exploring the use of data mining in intrusion detection , 2001, SGMD.

[2]  Rafael A. Calvo,et al.  Mining Text with Pimiento , 2006, IEEE Internet Computing.

[3]  Terran Lane,et al.  An Application of Machine Learning to Anomaly Detection , 1999 .

[4]  John D. Howard,et al.  An analysis of security incidents on the Internet 1989-1995 , 1998 .

[5]  Alessandro Moschitti,et al.  A Study on Optimal Parameter Tuning for Rocchio Text Classifier , 2003, ECIR.

[6]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[7]  Jordi Bacaria Martrus La regulación de los ficheros y registros de población de las Administraciones Públicas en la Ley Orgánica 15/1999, de 13 de diciembre, de Protección de Datos de Carácter Personal. Efectos de la Sentencia del Tribunal Constitucional 292/2000 , 2002 .

[8]  Juan Jose García Adeva,et al.  Arnasa: una forma de desarrollo basado en el dominio en la construcción de un DSS para la gestión del proceso de tratamiento del asmo vía Web , 2002, JISBD.

[9]  Diego López-de-Ipiña,et al.  Towards a Clinical Practice Guideline Implementation for Asthma Treatment , 2003, CAEPIA.

[10]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[11]  Juan Jose García Adeva,et al.  Serving Text-Mining Functionalities with the Software Architecture Plato , 2006, 2006 International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA'06).

[12]  Eugene H. Spafford,et al.  An Application of Pattern Matching in Intrusion Detection , 1994 .

[13]  Nei Kato,et al.  Towards trapping wily intruders in the large , 2000, Recent Advances in Intrusion Detection.

[14]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[15]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[16]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[17]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[18]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[19]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1986, 1986 IEEE Symposium on Security and Privacy.

[20]  Philip K. Chan,et al.  A Machine Learning Approach to Anomaly Detection , 2003 .

[21]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[22]  Zied Elouedi,et al.  Naive Bayes vs decision trees in intrusion detection systems , 2004, SAC '04.