Automatic discovery of similarity relationships through Web mining

This work demonstrates how the World Wide Web can be mined in a fully automated manner for discovering the semantic similarity relationships among the concepts surfaced during an electronic brainstorming session, and thus improving the accuracy of automated clustering meeting messages. Our novel Context Sensitive Similarity Discovery (CSSD) method takes advantage of the meeting context when selecting a subset of Web pages for data mining, and then conducts regular concept co-occurrence analysis within that subset. Our results have implications on reducing information overload in applications of text technologies such as email filtering, document retrieval, text summarization, and knowledge management.

[1]  J. Valacich,et al.  Effects of anonymity and evaluative tone on idea generation in computer-mediated groups , 1990 .

[2]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[3]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[4]  Jay F. Nunamaker,et al.  A Graphical, Self-Organizing Approach to Classifying Electronic Meeting Output , 1997, J. Am. Soc. Inf. Sci..

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  Jay F. Nunamaker,et al.  Automatic concept classification of text from electronic meetings , 1994, CACM.

[7]  W. Bruce Croft,et al.  Corpus-based stemming using cooccurrence of word variants , 1998, TOIS.

[8]  Akhil Kumar,et al.  A Dynamic Grouping Technique for Distributing Codified-Knowledge in Large Organizations , 2000 .

[9]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[10]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[11]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[12]  Hal R. Varian,et al.  Reprint: How Much Information? , 2000 .

[13]  J. Hammersley,et al.  Monte Carlo Methods , 1965 .

[14]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[15]  Hsinchun Chen,et al.  Alleviating Search Uncertainty Through Concept Associations: Automatic Indexing, Co-Occurrence Analysis, and Parallel Computing , 1998, J. Am. Soc. Inf. Sci..

[16]  J. R. Firth,et al.  Studies in Linguistic Analysis. , 1974 .

[17]  K. J. Lynch,et al.  Automatic construction of networks of concepts characterizing document databases , 1992, IEEE Trans. Syst. Man Cybern..

[18]  Peter Willett,et al.  The limitations of term co-occurrence data for query expansion in document retrieval systems , 1991, J. Am. Soc. Inf. Sci..

[19]  Jay F. Nunamaker,et al.  Electronic meeting systems , 1991, CACM.

[20]  Carolyn J. Crouch,et al.  An approach to the automatic construction of global thesauri , 1990, Inf. Process. Manag..

[21]  Hsinchun Chen,et al.  A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Starr Roxanne Hiltz,et al.  Structuring computer-mediated communication systems to avoid information overload , 1985, CACM.

[23]  Jack Minker,et al.  An evaluation of query expansion by the addition of clustered terms for a document retrieval system , 1972, Inf. Storage Retr..

[24]  Hsinchun Chen,et al.  Internet Categorization and Search: A Self-Organizing Approach , 1996, J. Vis. Commun. Image Represent..

[25]  Marshall Ramsey,et al.  Visualizing Internet search results with adaptive self-organizing maps (demonstration abstract) , 1999, SIGIR '99.

[26]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[27]  Hsinchun Chen,et al.  Document clustering for electronic meetings: an experimental comparison of two techniques , 1999, Decis. Support Syst..