Modeling of Network Computing Systems for Decision Tree Induction Tasks

Since the amount of information is rapidly growing, there is an overwhelming interest in efficient network computing systems including Grids, public-resource computing systems, P2P systems and cloud computing. In this paper we take a detailed look at the problem of modeling and optimization of network computing systems for parallel decision tree induction methods. Firstly, we present a comprehensive discussion on mentioned induction methods with a special focus on their parallel versions. Next, we propose a generic optimization model of a network computing system that can be used for distributed implementation of parallel decision trees. To illustrate our work we provide results of numerical experiments showing that the distributed approach enables significant improvement of the system throughput.

[1]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[2]  Ying Zhu,et al.  Overlay Networks with Linear Capacity Constraints , 2008, IEEE Trans. Parallel Distributed Syst..

[3]  Chao-Tung Yang,et al.  Decision tree construction for data mining on grid computing environments , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[4]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[5]  Georgios Paliouras,et al.  The Effect of Numeric Features on the Scalability of Inductive Learning Programs , 1995, ECML.

[6]  Onur Dikmen,et al.  Parallel univariate decision trees , 2007, Pattern Recognit. Lett..

[7]  Ruoming Jin,et al.  Communication and Memory Efficient Parallel Decision Tree Construction , 2003, SDM.

[8]  Chuan Wu,et al.  rStream: resilient peer-to-peer streaming with rateless codes , 2005, MULTIMEDIA '05.

[9]  Hamid R. Rabiee,et al.  An optimal discrete rate allocation for overlay video multicasting , 2008, Comput. Commun..

[10]  Stefan Wrobel,et al.  Machine Learning: ECML-95 , 1995, Lecture Notes in Computer Science.

[11]  Laveen N. Kanal Parallel Processing for Artificial Intelligence , 1994 .

[12]  Jarek Nabrzyski,et al.  Grid resource management: state of the art and future trends , 2004 .

[13]  Ion Stoica,et al.  Peer-to-Peer Systems II , 2003, Lecture Notes in Computer Science.

[14]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[15]  Vipin Kumar,et al.  Parallel Formulations of Decision-Tree Classification Algorithms , 2004, Data Mining and Knowledge Discovery.

[16]  Ian J. Taylor From P2P to Web Services and Grids - Peers in a Client/Server World , 2005, Computer Communications and Networks.

[17]  Ian T. Foster,et al.  On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing , 2003, IPTPS.

[18]  Yael Ben-Haim,et al.  A Streaming Parallel Decision Tree Algorithm , 2010, J. Mach. Learn. Res..

[19]  Richard Kufrin,et al.  Decision trees on parallel processors , 1997, Parallel Processing for Artificial Intelligence 3.

[20]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[21]  Georges Gardarin,et al.  Advances in Database Technology — EDBT '96 , 1996, Lecture Notes in Computer Science.

[22]  Ruoming Jin,et al.  Efficient decision tree construction on streaming data , 2003, KDD '03.

[23]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.