Efficient Dimensionality Reduction for Big Data Using Clustering Technique

— Clustering is unsupervised classification of patterns (observations, data items, or feature vectors) into teams (clusters). The drawbacks of clustering has been addressed in several contexts by researchers in several disciplines and so reflects its broad charm and quality in concert of the steps in exploratory data analysis. Clustering is useful in several exploratory pattern analysis, grouping, machine learning and making decisions as well as situations including data mining, document retrieval, image segmentation and pattern classi fication. We are living in a digital world. Every day, people generate massive amount of data and store it, for further analysis and management. The amount of knowledge in our world has been exploding. Big Data refers to extremely large datasets that may be analyzed computationally to reveal patterns, trends, and associations especially relating to human behavior and interactions. Due to the short growth of such information, solutions need to be studied so as to handle and extract price and information from these data sets. Therefore an analysis of the different classes of available clustering techniques with big datasets may provide significant and useful conclusions. The proposed system is to study and analyze some of the popular existing clustering techniques and impact of dimensionality reduction on Big Data.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Henrik Boström,et al.  Improving Fusion of Dimensionality Reduction Methods for Nearest Neighbor Classification , 2009, 2009 International Conference on Machine Learning and Applications.

[3]  Ameer Ahmed Abbasi,et al.  A survey on clustering algorithms for wireless sensor networks , 2007, Comput. Commun..

[4]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[5]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[6]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[7]  Chris H. Q. Ding,et al.  Principal Component Analysis and Effective K-Means Clustering , 2004, SDM.

[8]  Makoto Takizawa,et al.  A Survey on Clustering Algorithms for Wireless Sensor Networks , 2010, 2010 13th International Conference on Network-Based Information Systems.

[9]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[10]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.