Spatial Clustering Technique for Data Mining

For mining features from the social web, analysis of the shape, detection of network topology and corresponding special meanings and also clustering of data become tools, because the information obtained by these tools can create useful data behind the social web by revealing its relationships and the relative positions of data. For example, if we want to understand the effect of someone’s statement on others, it is necessary to analyze the total interaction between all data elements and evaluate the focused data that results from the interactions. Otherwise, the precise effect of the data cannot be obtained. Thus, the effect becomes a special feature of the organized data, which is represented by a suitable form in which interaction works well. The feature, which is included by social web and it is effect someone’s statement, may be the shape of a network or the particular location of data or a cluster. So far, most conventional representations of the data structure of the social web use networks, because all objects are typically described by the relations of pairs of objects. The weak aspect of network representation is the scalability problem when we deal with huge numbers of objects on the Web. It is becoming standard to analyze or mine data from networks in the social web with hundreds of millions of items. Complex network analysis mainly focuses on the shape or clustering coefficients of the whole network, and the aspects and attributes of the network are also studied using semistructured data-mining techniques. These methods use the whole network and data directly, but they have high computational costs for scanning all objects in the network. For that reason, the network node relocation problem is important for solving these social-web data-mining problems. If we can relocate objects in the network into a new space in which it is easier to understand some aspects or attributes, we can more easily show or extract the features of shapes or clusters in that space, and network visualization becomes a space-relocation problem. Nonmetric multidimensional scaling (MDS) is a well-known technique for solving new-space relocation problems of networks. Kruskal (1964) showed how to relocate an object into n-dimensional space using interobject similarity or dissimilarity. Komazawa & Hayashi (1982) solved Kruskal’s MDS as an eigenvalue problem, which is called quantification method IV (Q-IV). However, these techniques have limitations for cluster objects because the stress, which is the attraction or repulsive force between two objects, is expressed by a linear formula. Thus, these methods can relocate exact positions of objects into a space but it is difficult to translate clusters into that space. This chapter introduces a novel technique called Associated Keyword Space (ASKS) for the space-relocation problem,which can create clusters from object correlations. ASKS is based on 16

[1]  Yuichi Yaguchi,et al.  Cross-Media Data Mining Using Associated Keyword Space , 2009, 2009 Ninth IEEE International Conference on Computer and Information Technology.

[2]  Jie Yan,et al.  The small world and scale-free structure of an internet technical community , 2007, CHIMIT '07.

[3]  Keitaro Naruse,et al.  Word Space: A New Approach to Describe Word Meanings , 2006, The Sixth IEEE International Conference on Computer and Information Technology (CIT'06).

[4]  Keitaro Naruse,et al.  A mining method for linked Web pages using associated keyword space , 2006, International Symposium on Applications and the Internet (SAINT'06).

[5]  A. Vázquez,et al.  Network clustering coefficient without degree-correlation biases. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Christopher C. Yang,et al.  Mining web site's topic hierarchy , 2005, WWW '05.

[7]  Amin Vahdat,et al.  Routing in an Internet-scale network emulator , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[8]  Mohammed J. Zaki,et al.  Visual web mining , 2004, WWW Alt. '04.

[9]  Tsuyoshi Murata,et al.  Visualizing the structure of Web communities based on data acquired from a search engine , 2003, IEEE Trans. Ind. Electron..

[10]  Joao Antonio Pereira,et al.  Linked: The new science of networks , 2002 .

[11]  Chris H. Q. Ding,et al.  Automatic topic identification using webpage clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[12]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[13]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[14]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[15]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[16]  W. Scott Spangler,et al.  Clustering hypertext with applications to web searching , 2000, HYPERTEXT '00.

[17]  Frank M. Shipman,et al.  Proceedings of the eleventh ACM on Hypertext and hypermedia , 2000 .

[18]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[19]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[20]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[21]  Ellen Spertus,et al.  ParaSite: Mining Structural Information on the Web , 1997, Comput. Networks.

[22]  Rick Kazman,et al.  WebQuery: Searching and Visualizing the Web Through Connectivity , 1997, Comput. Networks.

[23]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[24]  Roger N. Shepard,et al.  Multidimensional scaling : theory and applications in the behavioral sciences , 1974 .

[25]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[26]  A. Householder,et al.  Discussion of a set of points in terms of their mutual distances , 1938 .

[27]  Kwok Yip Szeto,et al.  Detecting Hierarchical Organization in Complex Networks by Nearest Neighbor Correlation , 2007, NICSO.

[28]  Jörg Sander,et al.  Focused Co-citation: Improving the Retrieval of Related Pages on the Web , 2003, WWW.

[29]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[30]  David Lodge Small World , 1988 .