A Decision Tree Algorithm for Distributed Data Mining: Towards Network Intrusion Detection

This paper presents preliminary works on an agent-based approach for distributed learning of decision trees. The distributed decision tree approach is applied to intrusion detection domain, the interest of which is recently increasing. In the approach, a network profile is built by applying a distributed data analysis method for the collection of data from distributed hosts. The method integrates inductive generalization and agent-based computing, so that classification rules are learned via tree induction from distributed data to be used as intrusion profiles. Agents, in a collaborative fashion, generate partial trees and communicate the temporary results among them in the form of indices to the data records. Experimental results are presented for military network domain data used for the network intrusion detection in KDD cup 1999. Several experimental results show that the performance of distributed version of decision tree is much better than that of non-distributed version with data collected manually from distributed hosts.

[1]  Nigel P. Topham,et al.  Performance of the decoupled ACRI-1 architecture: the perfect club , 1995, HPCN Europe.

[2]  Shonali Krishnaswamy,et al.  An architecture to support distributed data mining services in e-commerce environments , 2000, Proceedings Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems. WECWIS 2000.

[3]  Salvatore J. Stolfo,et al.  Distributed data mining in credit card fraud detection , 1999, IEEE Intell. Syst..

[4]  Mario Cannataro,et al.  Distributed data mining on the grid , 2002, Future Gener. Comput. Syst..

[5]  Kenji Yamanishi,et al.  Distributed cooperative Bayesian learning strategies , 1997, COLT '97.

[6]  Mario Cannataro Clusters and Grids for Distributed and Parallel Knowledge Discovery , 2000, HPCN Europe.

[7]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[8]  Selvakumar Manickam,et al.  Distributed data mining from heterogeneous healthcare data repositories: towards an intelligent agent-based framework , 2002, Proceedings of 15th IEEE Symposium on Computer-Based Medical Systems (CBMS 2002).

[9]  Byung-Hoon Park,et al.  Collective Data Mining: A New Perspective Toward Distributed Data Analysis , 1999 .

[10]  Matthias Klusch,et al.  Agent-Based Distributed Data Mining: The KDEC Scheme , 2003, AgentLink.

[11]  Matthias Klusch,et al.  Intelligent Information Agents , 1999, Springer Berlin Heidelberg.

[12]  Gagan Agrawal,et al.  Developing Distributed Data Mining Implementations for a Grid Environment , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[13]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[14]  Syed Sibte Raza Abidi Applying Data Mining in Healthcare: An Info- Structure for Delivering 'Data-Driven' Strategic Services , 1999, MIE.