Handling Structured Data Using Data Mining Clustering Techniques

In the new era, every organization has the capability to store the extremely large amount of data. The continuous rise in the capturing of data is turning it into a huge tomb of data. Such huge data is becoming difficult to get analysed. This constantly growing large data set is making the challenge to the researchers in discovering knowledge from it. Valuable information is buried under the huge collection of data which can be extracted by making the use of Data Mining technique, as it possess the ability to dig out the embedded precious information from the large datasets. Various application areas required this technique, thus, resulted into an evolution of many data mining methods. Though several data mining methods get evolved not all of them were capable to deal with high voluminous data. Numerous computation and data- intensive scientific data analyses are established to compete with the ongoing time. As today’s data has got converted to Big data, it now require large-scale data mining analyses to fulfil its scalability and performance requirements. To serve such data, several efficient parallel and concurrent algorithms got applied. The parallel algorithms used different parallelization techniques to manage the huge voluminous data and brought them into real action. Formerly, these techniques were : threads, MPI etc. which produce different performance and usability characteristics. The MPI model was efficient in computing rigorous problems but difficult to bring them into the practical use. Over coming years, Data mining is continuously spreading its root in business and in learning organizations. The new integrated clustering algorithm called CURE became more vigorous to outliers and recognizes those clusters that were having irregular shapes and are of variant size. CURE is formed with the combined features of random sampling and partitioning which assured that the quality of output clusters produced by it is much improved with respect to those clusters that are resulted from the prior algorithms. This paper put focus on CURE clustering technique which found suitable for working with large databases.

[1]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[2]  David J. DeWitt,et al.  Parallel Database Systems: The Future of High Performance Database Processing 1 , 1992 .

[3]  Seema Maitrey A SURVEY: HIERARCHICAL CLUSTERING ALGORITHM IN DATA MINING , 2012 .

[4]  Karimnagar Salim Jiwani,et al.  A Survey on clustering , 2010 .

[5]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[6]  Yu Zheng,et al.  Trajectory Data Mining , 2015, ACM Trans. Intell. Syst. Technol..

[7]  Alex Alves Freitas,et al.  Mining Very Large Databases with Parallel Processing , 1997, The Kluwer International Series on Advances in Database Systems.

[8]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[9]  P. Johri,et al.  Survey on Privacy Preserving Data Mining , 2014 .

[10]  Geoffrey C. Fox,et al.  Parallel Data Mining from Multicore to Cloudy Grids , 2008, High Performance Computing Workshop.

[11]  O. M.MehdiOwrang,et al.  Handling large databases in data mining , 2000, IRMA Conference.

[12]  Charles Elkan,et al.  Scalability for clustering algorithms revisited , 2000, SKDD.

[13]  Song Sun,et al.  Analysis and acceleration of data mining algorithms on high performance reconfigurable computing platforms , 2011 .

[14]  An-yu Yu,et al.  USING THE AGGLOMERATIVE METHOD OF HIERARCHICAL CLUSTERING AS A DATA MINING TOOL IN CAPITAL MARKET 1 , 2008 .

[15]  P. Indirapriya,et al.  A Survey on Different Clustering Algorithms in Data Mining Technique , 2013 .

[16]  Matteo Riondato Sampling-Based Data Mining Algorithms: Modern Techniques and Case Studies , 2014, ECML/PKDD.

[17]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[18]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[19]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[20]  Shouhong Wang,et al.  A knowledge management approach to data mining process for business intelligence , 2008, Ind. Manag. Data Syst..

[21]  Jin Dawei,et al.  The Application of Date Mining in Knowledge Management , 2011, 2011 Fifth International Conference on Management of e-Commerce and e-Government.

[22]  Vilas M. Thakare,et al.  DATA MINING SYSTEM AND APPLICATIONS: A REVIEW , 2010 .

[23]  Lokesh Singh,et al.  Clustering Techniques: A Brief Survey of Different Clustering Algorithms , 2012 .

[24]  Srinivasan Parthasarathy,et al.  Evaluation of sampling for data mining of association rules , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[25]  Erica Kolatch,et al.  Clustering Algorithms for Spatial Databases: A Survey , 2001 .

[26]  Richard J. Roiger,et al.  Data Mining: A Tutorial Based Primer , 2002 .

[27]  Dr. Chandra,et al.  A Survey on Clustering Algorithms for Data in Spatial Database Management Systems , 2011 .

[28]  S Ajay,et al.  SAMPLING TECHNIQUES & DETERMINATION OF SAMPLE SIZE IN APPLIED STATISTICS RESEARCH: AN OVERVIEW , 2014 .

[29]  R. V. Kulkarni,et al.  REVIEW OF LITERATURE ON DATA MINING , 2012 .

[30]  Neelamadhab Padhy,et al.  The Survey of Data Mining Applications And Feature Scope , 2012, ArXiv.

[31]  Shivani Goel,et al.  A comprehensive study on clustering approaches for big data mining , 2015, 2015 2nd International Conference on Electronics and Communication Systems (ICECS).

[32]  Sonali Agarwal,et al.  Stream Data Mining: Platforms, Algorithms, Performance Evaluators and Research Trends , 2016 .

[33]  GreenlawRaymond,et al.  Survey of Clustering , 2013 .

[34]  Inderjit S. Dhillon,et al.  Efficient Clustering of Very Large Document Collections , 2001 .

[35]  Alex Berson,et al.  Building Data Mining Applications for CRM , 1999 .