The Fractal Nature of Relevance: A Hypothesis

This article proposes a new model, based on fractal geometry, for clusters of relevant documents. It reflects the relatively simple iterative search process used by interactive online searchers. The untested model has the additional attractive features of highlighting the logarithmic growth of clusters, which produces complexities in relevance judgments and document clusters not realized by typical models. It indicates that clusters formed using dynamic search strategies appear topologically distinct, indecomposable, and result from chaotic processes. The model also provides an intuitive definition and representation of cluster dimension which differentiates, where typical models do not, between them. The fractal model, then, gives an indication of what I believe are the limits on clustering relevant documents.

[1]  John S. Nicolis,et al.  The role of chaos in reliable information processing , 1984 .

[2]  Kui-Lam Kwok,et al.  A probabilistic theory of indexing and similarity measure based on cited and citing documents , 1985, J. Am. Soc. Inf. Sci..

[3]  I. Peterson The Signal Value of Noise , 1991 .

[4]  Marcia J. Bates,et al.  The design of browsing and berrypicking techniques for the online search interface , 1989 .

[5]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[6]  Ivars Peterson,et al.  The Mathematical Tourist: Snapshots of Modern Mathematics , 1989 .

[7]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[8]  William Goffman,et al.  An indirect method of information retrieval , 1968, Inf. Storage Retr..

[9]  H. Gaifman,et al.  Symbolic Logic , 1881, Nature.

[10]  Tefko Saracevic,et al.  Individual Differences in Organizing, Searching and Retrieving Information. , 1991 .

[11]  Anil K. Jain,et al.  Validity studies in clustering methodologies , 1979, Pattern Recognit..

[12]  Ichiro Tsuda,et al.  Chaotic dynamics of information processing: The “magic number seven plus-minus two” revisited , 1985 .

[13]  Leo Egghe,et al.  The duality of informetric systems with applications to the empirical laws , 1990, J. Inf. Sci..

[14]  Joseph W. Janes An Alternative to Precision. , 1991 .

[15]  James Gleick,et al.  Chaos, Making a New Science , 1987 .

[16]  Karen Markey,et al.  ONTAP: Online Training and Practice Manual for ERIC Data Base Searchers. , 1978 .

[17]  M. Minsky The Society of Mind , 1986 .

[18]  M. Iivonen,et al.  Interindexer consistency and the indexing environment , 1990 .

[19]  Raya Fidel,et al.  Searchers' selection of search keys: III. Searching styles , 1991, J. Am. Soc. Inf. Sci..

[20]  Don R. Swanson,et al.  Information Retrieval as a Trial-And-Error Process , 1977, The Library Quarterly.

[21]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[22]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[23]  J. S. Nicolis,et al.  Chaotic dynamics in biological information processing: A heuristic outline , 1987 .

[24]  Kathleen Garland An experiment in automatic hierarchical document classification , 1983, Inf. Process. Manag..