Classification of web resources using user generated terms

In this study, we suggest a useful method to classify web resources based on social tag information generated by users. We attempted to examine whether social tags could be a tool of classifying websites in a certain domain. We applied two statistical methods, including principal component analysis (PCA) and hierarchical clustering for classifying websites in the domain of consumer health information. First, PCA method was applied to identify different dimensions of the selected websites. Six dimensions were extracted from PCA: women, seniors, kids/parenting, drugs, men, and research. Second, we conducted a hierarchical clustering analysis to group similar websites in different hierarchical levels. These two methods reveal that social tags well represent the characteristics of individual websites in the domain of health information. This study yields a methodological implication that social tags can be used to automatically classify resources on the Web.