Data mining in a closed Web environment

The need to understand the fabric of relationships that are building up on the World Wide Web calls for the application of tools that allow one to extract the underlying knowledge. Some of the most interesting relationships are those that are brought to light by co-linking analysis (the Web analogue of cocitation analysis). We here propose such an analysis based on the co-links that are generated within a closed web environment, using multivariate statistics (Principal Component Analysis, and Multidimensional Scaling) and a connection-based technique (Kohonen's Self-Organizing Maps). An application was made to a generic thematic environment, and the underlying relationships and structures were manifest in the interpretation of the results.

[1]  M. Thelwall,et al.  Library and Information Science Schools in Canada and USA: A Webometric Perspective , 2002 .

[2]  Gobinda G. Chowdhury,et al.  Journal as Markers of Intellectual Space: Journal Co-Citation Analysis of Information Retrieval Area, 1987–1997 , 2004, Scientometrics.

[3]  Hsinchun Chen,et al.  Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[4]  José Ramón Hilera González,et al.  Redes neuronales artificiales: fundamentos, modelos y aplicaciones , 1995 .

[5]  Stephen P. Harter,et al.  Web-based analyses of E-journal impact: Approaches, problems, and issues , 2000, J. Am. Soc. Inf. Sci..

[6]  Loet Leydesdorff,et al.  Mapping Change in Scientific Specialties: A Scientometric Reconstruction of the Development of Artificial Intelligence , 1996, J. Am. Soc. Inf. Sci..

[7]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[8]  Hak-Joon Kim,et al.  Motivations for hyperlinking in scholarly electronic articles: A qualitative study , 2000, J. Am. Soc. Inf. Sci..

[9]  Ray R. Larson,et al.  Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace , 1996 .

[10]  Boris L. Milman Individual co-citation clusters as nuclei of complete and dynamic informetric models of scientific and technological areas , 2005, Scientometrics.

[11]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[12]  Anthony F. J. van Raan,et al.  Bibliometrics and internet: Some observations and expectations , 2004, Scientometrics.

[13]  Leo Egghe,et al.  New informetric aspects of the Internet: some reflections - many problems , 2000, J. Inf. Sci..

[14]  Samuel Kaski,et al.  Keyword selection method for characterizing text document maps , 1999 .

[15]  Félix de Moya Anegón,et al.  Reduction of the dimension of a document space using the fuzzified output of a Kohonen network , 2001, J. Assoc. Inf. Sci. Technol..

[16]  Félix de Moya Anegón,et al.  Document organization using Kohonen's algorithm , 2002, Inf. Process. Manag..

[17]  Howard D. White A cocitation map of the social indicators movement , 1983, J. Am. Soc. Inf. Sci..

[18]  Pedro López López,et al.  Introducción a la bibliometría , 1996 .

[19]  Cristina Faba Pérez Análisis cibermétrico de la información web: el caso de Extremadura en internet , 2003 .

[20]  Cristina Faba-Pérez,et al.  Data mining in a closed Web environment , 2003, Scientometrics.

[21]  Michael D. Cooper,et al.  Using clustering techniques to detect usage patterns in a Web-based information system , 2001, J. Assoc. Inf. Sci. Technol..

[22]  Robert C. Vreeland Law Libraries in Hyperspace: A Citation Analysis of World Wide Web Sites , 2000 .

[23]  S. C. Hui,et al.  Mining a Web Citation Database for author co-citation analysis , 2002, Inf. Process. Manag..

[24]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[25]  Howard D. White,et al.  Author cocitation: A literature measure of intellectual structure , 1981, J. Am. Soc. Inf. Sci..

[26]  Blaise Cronin,et al.  Bibliometrics and beyond: some thoughts on web-based citation analysis , 2001, J. Inf. Sci..

[27]  LeydesdorffLoet,et al.  The self-organization of the European information society , 2001 .

[28]  R. Rousseau Sitations: an exploratory study , 1997 .

[29]  Félix Moya-Anegón,et al.  Automatic extraction of relationships between terms by means of Kohonen's algorithm , 2002 .

[30]  Xia Lin,et al.  Map Displays for Information Retrieval , 1997, J. Am. Soc. Inf. Sci..

[31]  Samuel Kaski,et al.  Self organization of a massive text document collection , 1999 .

[32]  K. McCain,et al.  Visualization of Literatures. , 1997 .

[33]  Samuel Kaski,et al.  Fast winner search for SOM-based monitoring and retrieval of high-dimensional data , 1999 .

[34]  Olle Persson,et al.  The Intellectual Base and Research Fronts of JASIS 1986-1990 , 1994, J. Am. Soc. Inf. Sci..

[35]  Evaristo Jiménez-Contreras,et al.  Research fronts in library and information science in Spain (1985–1994) , 1998, Scientometrics.

[36]  Yuko Fujigaki,et al.  The citation system: Citation networks as repeatedly focusing on difference, continuous re-evaluation, and as persistent knowledge accumulation , 1998, Scientometrics.

[37]  E. Garfield From Citation Indexes to Informetrics: Is the Tail Now Wagging the Dog ? , 1998 .

[38]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[39]  Henry G. Small,et al.  Clustering thescience citation index® using co-citations , 1985, Scientometrics.

[40]  Howard D. White Cocited author retrieval online: An experiment with the social indicators literature , 1981, J. Am. Soc. Inf. Sci..

[41]  Vicente Pablo Guerrero Bote Redes neuronales aplicadas a las técnicas de recuperación documental , 1998 .

[42]  Katherine W. McCain,et al.  Visualizing a discipline: an author co-citation analysis of information science, 1972–1995 , 1998 .

[43]  Lars Kai Hansen,et al.  Webmining: learning from the world wide web , 2002 .

[44]  Timo Honkela,et al.  Websom for Textual Data Mining , 1999, Artificial Intelligence Review.

[45]  Henry G. Small,et al.  Clustering the science citation index using co-citations. II. Mapping science , 1985, Scientometrics.

[46]  Anthony F. J. van Raan,et al.  Fractal geometry of information space as represented by co-citation clustering , 1991, Scientometrics.

[47]  Peter Ingwersen,et al.  Informetric analyses on the world wide web: methodological approaches to 'webometrics' , 1997, J. Documentation.