Data mining in a closed Web environment

The need to understand the fabric of relationships that are building up on the World Wide Web calls for the application of tools that allow one to extract the underlying knowledge. Some of the most interesting relationships are those that are brought to light by co-linking analysis (the Web analogue of cocitation analysis). We here propose such an analysis based on the co-links that are generated within a closed web environment, using multivariate statistics (Principal Component Analysis, and Multidimensional Scaling) and a connection-based technique (Kohonen's Self-Organizing Maps). An application was made to a generic thematic environment, and the underlying relationships and structures were manifest in the interpretation of the results.

[1]  Carnot E. Nelson,et al.  Communication among scientists and engineers , 1970 .

[2]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[3]  Howard D. White,et al.  Author cocitation: A literature measure of intellectual structure , 1981, J. Am. Soc. Inf. Sci..

[4]  Howard D. White Cocited author retrieval online: An experiment with the social indicators literature , 1981, J. Am. Soc. Inf. Sci..

[5]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[6]  Howard D. White A cocitation map of the social indicators movement , 1983, J. Am. Soc. Inf. Sci..

[7]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[8]  Olle Persson,et al.  The Intellectual Base and Research Fronts of JASIS 1986-1990 , 1994, J. Am. Soc. Inf. Sci..

[9]  José Ramón Hilera González,et al.  Redes neuronales artificiales: fundamentos, modelos y aplicaciones , 1995 .

[10]  Pedro López López,et al.  Introducción a la bibliometría , 1996 .

[11]  Loet Leydesdorff,et al.  Mapping Change in Scientific Specialties: A Scientometric Reconstruction of the Development of Artificial Intelligence , 1996, J. Am. Soc. Inf. Sci..

[12]  Ray R. Larson,et al.  Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace , 1996 .

[13]  Peter Ingwersen,et al.  Informetric analyses on the world wide web: methodological approaches to 'webometrics' , 1997, J. Documentation.

[14]  K. McCain,et al.  Visualization of Literatures. , 1997 .

[15]  R. Rousseau Sitations: an exploratory study , 1997 .

[16]  Xia Lin Map displays for information retrieval , 1997 .

[17]  Xia Lin,et al.  Map Displays for Information Retrieval , 1997, J. Am. Soc. Inf. Sci..

[18]  Francisco Casacuberta Nolla Redes Neuronales Artificiales , 1998 .

[19]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[20]  Vicente Pablo Guerrero Bote Redes neuronales aplicadas a las técnicas de recuperación documental , 1998 .

[21]  E. Garfield From Citation Indexes to Informetrics: Is the Tail Now Wagging the Dog ? , 1998 .

[22]  Katherine W. McCain,et al.  Visualizing a discipline: an author co-citation analysis of information science, 1972–1995 , 1998 .

[23]  Katherine W. McCain,et al.  Visualizing a Discipline: An Author Co-Citation Analysis of Information Science, 1972-1995 , 1998, J. Am. Soc. Inf. Sci..

[24]  Hsinchun Chen,et al.  Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[25]  Samuel Kaski,et al.  Self organization of a massive text document collection , 1999 .

[26]  Juan Carlos Fernández Molina,et al.  La representación y la organización del conocimiento en sus distintas perspectivas. Su influencia en la recuperación de la información: actas del IV Congreso ISKO-España EOCONSID 99. 22-24 de abril de 1999, Granada , 1999 .

[27]  Samuel Kaski,et al.  Keyword selection method for characterizing text document maps , 1999 .

[28]  Samuel Kaski,et al.  Fast winner search for SOM-based monitoring and retrieval of high-dimensional data , 1999 .

[29]  Robert C. Vreeland Law Libraries in Hyperspace: A Citation Analysis of World Wide Web Sites , 2000 .

[30]  Hak-Joon Kim,et al.  Motivations for hyperlinking in scholarly electronic articles: A qualitative study , 2000, J. Am. Soc. Inf. Sci..

[31]  Stephen P. Harter,et al.  Web-based analyses of E-journal impact: Approaches, problems, and issues , 2000, J. Am. Soc. Inf. Sci..

[32]  Hak-Joon Kim Motivation for hyperlinking in scholarly electronic articles: a qualitative study , 2000 .

[33]  Leo Egghe,et al.  New informetric aspects of the Internet: some reflections - many problems , 2000, J. Inf. Sci..

[34]  Blaise Cronin,et al.  Bibliometrics and beyond: some thoughts on web-based citation analysis , 2001, J. Inf. Sci..

[35]  Félix de Moya Anegón,et al.  Reduction of the dimension of a document space using the fuzzified output of a Kohonen network , 2001, J. Assoc. Inf. Sci. Technol..

[36]  Michael D. Cooper,et al.  Using clustering techniques to detect usage patterns in a Web-based information system , 2001, J. Assoc. Inf. Sci. Technol..

[37]  Lars Kai Hansen,et al.  Webmining: learning from the world wide web , 2002 .

[38]  S. C. Hui,et al.  Mining a Web Citation Database for author co-citation analysis , 2002, Inf. Process. Manag..

[39]  Félix de Moya Anegón,et al.  Document organization using Kohonen's algorithm , 2002, Inf. Process. Manag..

[40]  M. Thelwall,et al.  Library and Information Science Schools in Canada and USA: A Webometric Perspective , 2002 .

[41]  Félix Moya-Anegón,et al.  Automatic extraction of relationships between terms by means of Kohonen's algorithm , 2002 .

[42]  Cristina Faba Pérez Análisis cibermétrico de la información web: el caso de Extremadura en internet , 2003 .

[43]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[44]  Peter Ingwersen,et al.  Perspective of webometrics , 2004, Scientometrics.

[45]  Timo Honkela,et al.  Websom for Textual Data Mining , 1999, Artificial Intelligence Review.

[46]  Gobinda G. Chowdhury,et al.  Journal as Markers of Intellectual Space: Journal Co-Citation Analysis of Information Retrieval Area, 1987–1997 , 2004, Scientometrics.

[47]  Anthony F. J. van Raan,et al.  Bibliometrics and internet: Some observations and expectations , 2004, Scientometrics.

[48]  Anthony F. J. van Raan,et al.  Fractal geometry of information space as represented by co-citation clustering , 1991, Scientometrics.

[49]  Boris L. Milman Individual co-citation clusters as nuclei of complete and dynamic informetric models of scientific and technological areas , 2005, Scientometrics.

[50]  Henry G. Small,et al.  Clustering the science citation index using co-citations. II. Mapping science , 1985, Scientometrics.

[51]  Yuko Fujigaki,et al.  The citation system: Citation networks as repeatedly focusing on difference, continuous re-evaluation, and as persistent knowledge accumulation , 1998, Scientometrics.

[52]  Evaristo Jiménez-Contreras,et al.  Research fronts in library and information science in Spain (1985–1994) , 1998, Scientometrics.