The paper explores building profiles of newsgroups from a corpus of Usenet e-mail messages, employing some standard statistical techniques as well as fuzzy clustering methods. A large set of data from a number of newsgroups has been analysed to elicit some text attributes, such as number of words, length of sentences and other stylistic characteristics. Readability scores have also been obtained by using recognised assessment methods. These text attributes were used for building newsgroups’ profiles. Three newsgroups, each with a similar number of messages, were selected from the processed sample for the analysis of two types of one-dimensional profiles: one by length of texts and the second by readability scores. Those profiles are compared with corresponding profiles of the whole sample and also with those of a group of frequent participants in the newsgroups. Fuzzy clustering is used for creating two-dimensional profiles of the same groups. An attempt is made to identify the newsgroups by defining centres of data clusters. It is contended that this approach to newsgroups’ pro-file analysis could facilitate a better understanding of computer-mediated communication on the Usenet, which is a growing medium of informal business and personal correspondence.
[1]
D. K. Harmon,et al.
Overview of the Third Text Retrieval Conference (TREC-3)
,
1996
.
[2]
Teuvo Kohonen,et al.
Exploration of very large databases by self-organizing maps
,
1997,
Proceedings of International Conference on Neural Networks (ICNN'97).
[3]
Teuvo Kohonen,et al.
Self-Organization and Associative Memory
,
1988
.
[4]
Donna K. Harman,et al.
Overview of the Third Text REtrieval Conference (TREC-3)
,
1995,
TREC.
[5]
James C. Bezdek,et al.
Pattern Recognition with Fuzzy Objective Function Algorithms
,
1981,
Advanced Applications in Pattern Recognition.
[6]
Stephen E. Robertson,et al.
Okapi at TREC-3
,
1994,
TREC.
[7]
Stephen E. Robertson,et al.
GatfordCentre for Interactive Systems ResearchDepartment of Information
,
1996
.