Improving network systems performance by clustering distributed database sites

Clustering network sites is a vital issue in parallel and distributed database systems DDBS. Grouping distributed database network sites into clusters is considered an efficient way to minimize the communication time required for query processing. However, clustering network sites is still an open research problem since its optimal solution is NP-complete. The main contribution in this field is to find a near optimal solution that groups distributed database network sites into disjoint clusters in order to minimize the communication time required for data allocation. Grouping a large number of network sites into a small number of clusters effectively increases the transaction response time, results in better data distribution, and improves the distributed database system performance. We present a novel algorithm for clustering distributed database network sites based on the communication time as database query processing is time dependent. Extensive experimental tests and simulations are conducted on this clustering algorithm. The experimental and simulation results show that a better network distribution is achieved with significant network servers load balance and network delay, a minor communication time between network sites is realized, and a higher distributed database system performance is recognized.

[1]  Patrick Valduriez,et al.  Principles of distributed database systems (2nd ed.) , 1999 .

[2]  Ossama Younis,et al.  Distributed clustering in ad-hoc sensor networks: a hybrid, energy-efficient approach , 2004, IEEE INFOCOM 2004.

[3]  Yiyu Yao,et al.  Time Complexity of Rough Clustering: GAs versus K-Means , 2002, Rough Sets and Current Trends in Computing.

[4]  Stuart Harvey Rubin,et al.  Stochastic clustering for organizing distributed information sources , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[6]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[7]  Fazli Can,et al.  Incremental clustering for dynamic information processing , 1993, TOIS.

[8]  Heikki Topi,et al.  Modern Database Management , 1999 .

[9]  Fred R. McFadden,et al.  Modern database management (4th ed.) , 1994 .

[10]  Michalis Vazirgiannis,et al.  Clustering algorithms and validity measures , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[11]  Myoung-Ho Kim,et al.  An adaptable vertical partitioning method in distributed systems , 2004, J. Syst. Softw..

[12]  Muthu Ramachandran,et al.  A high-performance computing method for data allocation in distributed database systems , 2006, The Journal of Supercomputing.

[13]  Klaus-Dieter Schewe,et al.  Distribution design for higher-order data models , 2007, Data Knowl. Eng..

[14]  Pradeep Kumar,et al.  Rough clustering of sequential data , 2007, Data Knowl. Eng..

[15]  Rogério Luís de Carvalho Costa,et al.  Database Allocation Strategies for Parallel BLAST Evaluation on Clusters , 2004, Distributed and Parallel Databases.

[16]  Jim Alves-Foss,et al.  Efficient allocation in distributed object oriented databases with capacity and security constraints , 2005 .

[17]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[18]  Syam Menon,et al.  Allocating fragments in distributed databases , 2005, IEEE Transactions on Parallel and Distributed Systems.

[19]  Pawan Lingras,et al.  Interval Set Clustering of Web Users with Rough K-Means , 2004, Journal of Intelligent Information Systems.

[20]  A. Fronczak,et al.  Higher order clustering coefficients in Barabási–Albert networks , 2002, cond-mat/0212237.