Hybrid Tolerance Rough Set Based Intelligent Approaches for Social Tagging Systems

The major challenge with Big Data analysis is the generation of huge amounts of data over a short period like Social tagging system. Social Tagging systems such as BibSonomy and del.icio.us have become progressively popular with the widespread use of the internet. The social tagging system is a popular way to annotate web 2.0 resources. Social tagging systems allow users to annotate web resources with free-form tags. Tags are widely used to interpret and classify the web 2.0 resources. Tag clustering is the process of grouping the similar tags into clusters. The tag clustering is very useful for searching and organizing the web2.0 resources and also important for the success of social tagging systems. Clustering the tag data is very tedious since the tag space is very large in several social bookmarking websites. So, instead of clustering the entire tag space of Web 2.0 data, some tags frequent enough in the tag space can be selected for clustering by applying feature selection techniques. The goal of feature selection is to determine a marginal bookmarked URL subset from Web 2.0 data while retaining a suitably high accuracy in representing the original bookmarks. In this chapter, Unsupervised Quick Reduct feature selection algorithm is applied to find a set of most commonly tagged bookmarks and this paper proposes TRS approach hybridized with Meta heuristic clustering algorithms. The proposed approaches are Hybrid TRS and K-Means Clustering (TRS-K-Means), Hybrid TRS and Particle swarm optimization (PSO) K-Means clustering algorithm (TRS-PSO-KMeans), and Hybrid TRS-PSO-K-Means-Genetic Algorithm (TRS-PSO-GA). These intelligent approaches automatically determine the number of clusters. These are in turn compared with K-Means benchmark algorithm for Social Tagging System.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Ahmad Taher Azar,et al.  Hybrid Tolerance Rough Set: PSO Based Supervised Feature Selection for Digital Mammogram Images , 2013, Int. J. Fuzzy Syst. Appl..

[3]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  Aboul Ella Hassanien,et al.  Adaptive k-means clustering algorithm for MR breast image segmentation , 2013, Neural Computing and Applications.

[5]  K. Thangavel,et al.  Rough Set Based Feature Selection for Web Usage Mining , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[6]  Xiaohua Hu,et al.  Exploiting the Social Tagging Network for Web Clustering , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[7]  Slobodan Vucetic,et al.  Supervised clustering of label ranking data using label preference information , 2013, Machine Learning.

[8]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  H. H. Inbarani,et al.  Web 2.0 social bookmark selection for tag clustering , 2013, 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering.

[10]  R. J. Kuo,et al.  An application of particle swarm optimization algorithm to clustering analysis , 2011, Soft Comput..

[11]  H. Hannah Inbarani,et al.  Soft Set Based Feature Selection Approach for Lung Cancer Images , 2012, ArXiv.

[12]  H. Hannah Inbarani,et al.  Fuzzy soft rough K-Means clustering approach for gene expression data , 2012, ArXiv.

[13]  Rajesh Kumar,et al.  A review on particle swarm optimization algorithms and their applications to data clustering , 2011, Artificial Intelligence Review.

[14]  Fakhri Karray,et al.  Flocking based approach for data clustering , 2010, Natural Computing.

[15]  Mehdi Sargolzaei,et al.  A New Cooperative Algorithm Based on PSO and K-Means for Data Clustering , 2012 .

[16]  Ahmad Taher Azar,et al.  PSORR - An unsupervised feature selection technique for fetal heart rate , 2013, 2013 5th International Conference on Modelling, Identification and Control (ICMIC).

[17]  Richard Jensen,et al.  Unsupervised fuzzy-rough set-based dimensionality reduction , 2013, Inf. Sci..

[18]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[19]  Jianwen Ma,et al.  Remote sensing data classification using tolerant rough set and neural networks , 2005 .

[20]  K. Thangavel,et al.  Mining and Analysis of Clickstream Patterns , 2009, Foundations of Computational Intelligence.

[21]  Ahmad Taher Azar,et al.  Feature selection using swarm-based relative reduct technique for fetal heart rate , 2014, Neural Computing and Applications.

[22]  Yu Zong,et al.  On Kernel Information Propagation for Tag Clustering in Social Annotation Systems , 2011, KES.

[23]  Taher Niknam,et al.  An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis , 2010, Appl. Soft Comput..

[24]  Tu Bao Ho,et al.  Nonhierarchical document clustering based on a tolerance rough set model , 2002, Int. J. Intell. Syst..

[25]  Rui Li,et al.  Survey on social tagging techniques , 2010, SKDD.

[26]  Nishu Sharma,et al.  A Comparative Study Of Data Clustering Techniques , 2013 .

[27]  H. Hannah Inbarani,et al.  Analysis of mixed C-means clustering approach for brain tumour gene expression data , 2013, Int. J. Data Anal. Tech. Strateg..

[28]  Supriya Kumar De,et al.  Clustering web transactions using rough approximation , 2004, Fuzzy Sets Syst..

[29]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[30]  Saeed Badshah,et al.  Vibration Analysis of an Ocean Current Turbine Blade , 2012 .

[31]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[32]  Bamshad Mobasher,et al.  Personalized recommendation in social tagging systems using hierarchical clustering , 2008, RecSys '08.

[33]  Xiaohua Hu,et al.  Data Mining via Discretization, Generalization and Rough Set Feature Selection , 1999, Knowledge and Information Systems.

[34]  Ahmad Taher Azar,et al.  Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis , 2014, Comput. Methods Programs Biomed..

[35]  Aboul Ella Hassanien,et al.  Dimensionality reduction of medical big data using neural-fuzzy classifier , 2014, Soft Computing.

[36]  Andrew Chi-Sing Leung,et al.  PSO-based K-Means clustering with enhanced cluster matching for gene expression data , 2012, Neural Computing and Applications.

[37]  Stan Matwin,et al.  A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data , 2013, Artificial Intelligence Review.

[38]  Davide Eynard,et al.  An integrated approach to discover tag semantics , 2011, SAC.

[39]  Lior Rokach,et al.  Data Mining and Knowledge Discovery Handbook, 2nd ed , 2010, Data Mining and Knowledge Discovery Handbook, 2nd ed..

[40]  Hiroshi Ando,et al.  Psychodynamic Appraisal Mechanism for Emotional Development through Multi-modal Interaction , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[41]  Bart Baesens,et al.  Editorial survey: swarm intelligence for data mining , 2010, Machine Learning.

[42]  S. Appavu alias Balamurugan,et al.  A novel feature selection framework for automatic web page classification , 2012, Int. J. Autom. Comput..

[43]  Grigory Begelman,et al.  Automated Tag Clustering: Improving search and exploration in the tag space , 2006 .

[44]  Thangavel,et al.  Unsupervised Quick Reduct Algorithm Using Rough Set Theory , 2011 .

[45]  Georgia Koutrika,et al.  Can social bookmarking improve web search? , 2008, WSDM '08.

[46]  R. J. Kuo,et al.  Application of a hybrid of genetic algorithm and particle swarm optimization algorithm for order clustering , 2010, Decis. Support Syst..