Novel and efficient clustering algorithm using structured query language

Clustering becomes an indispensable requirement while dealing with immense volume of data. Since database management tools does not provide an inbuilt mechanism to cluster datasets of higher magnitude it inevitably requires an external module to perform the same. This external module should be devised specifically to deal with the data extracted from the data source. There exists myriad of techniques for clustering and the conventional approaches deficits like excessive time and computational complexity. Computational complexity becomes a factor of consideration when data is extracted voluminously and the process of clustering and filtration is performed as a subsequent separate operation. The concept proposed is crafted with an object of eliminating the inherent redundancies in the accustomed practice. The proposed solution relies on exploiting the processing power of the database management tool by streamlining the SQL used for data extraction. And hence a procedure is formulated that can combine data retrieval and clustering to one single operation and leave it to DBMS without letting it to dissipate to the adjoining tiers.

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  Joseph P. Bigus,et al.  Data mining with neural networks , 1996 .

[3]  Ralf Rantzau,et al.  A Multi-Tier Architecture for High-Performance Data Mining , 1999, BTW.

[4]  Radu Sion,et al.  A grid-based approach for enterprise-scale data mining , 2007, Future Gener. Comput. Syst..

[5]  Yueting Zhuang,et al.  Fuzzy hierarchical clustering algorithm facing large databases , 2004, Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No.04EX788).

[6]  Carlos Ordonez Programming the K-means clustering algorithm in SQL , 2004, KDD '04.

[7]  Kai Zhao,et al.  Bounding and Estimating Association Rule Support from Clusters on Binary Data , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[8]  Surajit Chaudhuri,et al.  Integrating data mining with SQL databases: OLE DB for data mining , 2001, Proceedings 17th International Conference on Data Engineering.

[9]  Pierre Michaud,et al.  Clustering techniques , 1997, Future Gener. Comput. Syst..

[10]  Shishir K. Gupta,et al.  Mining Medical Data using SQL Queries and Contingency Tables , 2001 .

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[13]  Carlos Ordonez,et al.  Clustering binary data streams with K-means , 2003, DMKD '03.

[14]  Reynaldo Gil-García,et al.  A General Framework for Agglomerative Hierarchical Clustering Algorithms , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[15]  Jiancheng Luo,et al.  A modified clustering algorithm for data mining , 2005, Proceedings. 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005. IGARSS '05..

[16]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[17]  Paul Gray,et al.  Introduction to Data Mining and Knowledge Discovery , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[18]  Carlos Ordonez,et al.  Integrating K-means clustering with a relational DBMS using SQL , 2006, IEEE Transactions on Knowledge and Data Engineering.

[19]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.