Mining Internet Data Sets for Computational Grids

Data mining methodology and tools are employed in different application areas. This paper proposes a novel application field for data mining research, namely analysis and long-term forecasting of Internet performance, especially for the needs of Computational Grids. Using data mining the performance problems studied for Internet can be considered from new points of view, and sometimes with better understanding than through applying conventional data analysis methods. This knowledge has been mined by means of professional data mining package in order to build the decision model for advising in further exploitation and usage scheduling of Grid links for a particular time and date. The results show that the data mining can be efficiently used in this research area.

[1]  William E. Johnston,et al.  Computational and data Grids in large-scale science and engineering , 2002, Future Gener. Comput. Syst..

[2]  Philip S. Yu,et al.  The state of the art in locally distributed Web-server systems , 2002, CSUR.

[3]  Krishna P. Gummadi,et al.  King: estimating latency between arbitrary internet end hosts , 2002, IMW '02.

[4]  Richard Wolski,et al.  Dynamically forecasting network performance using the Network Weather Service , 1998, Cluster Computing.

[5]  Michael Stonebraker,et al.  The Morgan Kaufmann Series in Data Management Systems , 1999 .

[6]  Paul Avery,et al.  The griphyn project: towards petascale virtual data grids , 2001 .

[7]  Rajeev Rastogi,et al.  Data Mining Meets Network Management: The NEMESIS Project , 2001, DMKD.

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[9]  Krishna P. Gummadi,et al.  King: estimating latency between arbitrary internet end hosts , 2002, IMW '02.

[10]  Krishna P. Gummadi,et al.  An analysis of Internet content delivery systems , 2002, OPSR.

[11]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[12]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[13]  Azer Bestavros,et al.  On the marginal utility of network topology measurements , 2001, IMW '01.

[14]  Jeffrey C. Mogul Clarifying the fundamentals of HTTP , 2002, WWW '02.

[15]  Leszek Borzemski Data mining in evaluation of internet path performance , 2004 .

[16]  Richard Wolski,et al.  Multivariate Resource Performance Forecasting in the Network Weather Service , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[17]  Christos Faloutsos,et al.  Data mining meets performance evaluation: fast algorithms for modeling bursty traffic , 2002, Proceedings 18th International Conference on Data Engineering.

[18]  Yin Zhang,et al.  On the constancy of internet path properties , 2001, IMW '01.

[19]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.

[20]  Maarten van Steen,et al.  Characterizing Internet performance to support wide-area application development , 2000, OPSR.

[21]  Robert L. Grossman,et al.  Data Mining for Scientific and Engineering Applications , 2001, Massive Computing.

[22]  Moonis Ali,et al.  Innovations in Applied Artificial Intelligence , 2005 .

[23]  Ramakrishnan Srikant,et al.  Mining web logs to improve website organization , 2001, WWW '01.

[24]  Matthew J. Luckie,et al.  Towards improving packet probing techniques , 2001, IMW '01.