Generation of equifrequent Groups of Words using a Genetic Algorithm

Genetic algorithms are a class of non‐deterministic algorithms that derive from Darwinian evolution and that provide good, though not necessarily optimal, solutions to combinatorial problems. We describe their application to the identification of characteristics that occur approximately equifrequently in a database, using two different methods for the creation of the chromosome data structures that lie at the heart of a genetic algorithm. Experiments with files of English and Turkish text suggest that the genetic algorithm developed here can produce results superior to those produced by existing non‐deterministic algorithms; however, the results are inferior to those produced by an existing deterministic algorithm.

[1]  Pranas Zunde,et al.  Information theory and information science , 1981, Inf. Process. Manag..

[2]  Clive Richards,et al.  The Blind Watchmaker , 1987, Bristol Medico-Chirurgical Journal.

[3]  Emmanuel J. Yannakoudakis,et al.  Quasi-Equifrequent Group Generation an Evaluation , 1982, Comput. J..

[4]  Peter Willett,et al.  Selection of screens for three-dimensional substructure searching , 1990 .

[5]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[6]  Michael de la Maza,et al.  Book review: Genetic Algorithms + Data Structures = Evolution Programs by Zbigniew Michalewicz (Springer-Verlag, 1992) , 1993 .

[7]  Pankaj Goyal The maximum entropy approach to record abbreviation for optimal record control , 1983, Inf. Process. Manag..

[8]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[9]  David Cooper,et al.  The use of binary search trees in external distribution sorting , 1984, Inf. Process. Manag..

[10]  R. Dawkins The Blind Watchmaker , 1986 .

[11]  David E. Goldberg,et al.  Genetic and evolutionary algorithms come of age , 1994, CACM.

[12]  P. W. Williams,et al.  Criteria for Choosing Subsets to Obtain Maximum Relative Entropy , 1978, Comput. J..

[13]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[14]  Michael D. Gordon Probabilistic and genetic algorithms in document retrieval , 1988, CACM.

[15]  Dalia Motzkin,et al.  A generalized database directory for nondense attributes , 1988, Inf. Process. Manag..

[16]  Nostrand Reinhold,et al.  the utility of using the genetic algorithm approach on the problem of Davis, L. (1991), Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York. , 1991 .

[17]  H. S. Heaps,et al.  Selection of equifrequent word fragments for information retrieval , 1973, Inf. Storage Retr..

[18]  Donald R. Jones,et al.  Solving Partitioning Problems with Genetic Algorithms , 1991, International Conference on Genetic Algorithms.

[19]  P. W. Williams,et al.  Document Retrieval Using a Substring Index , 1977, Comput. J..

[20]  Mark P. Carpenter Similarity of Pratt's measure of class concentration to the Gini index , 1979, J. Am. Soc. Inf. Sci..

[21]  Hava T. Siegelmann,et al.  On the allocation of documents in multiprocessor information retrieval systems , 1991, SIGIR '91.

[22]  Gareth Jones,et al.  AN INTRODUCTION TO GENETIC ALGORITHMS AND TO THEIR USE IN INFORMATION RETRIEVAL , 1994 .

[23]  Michael F. Lynch,et al.  Variety generation - A reinterpretation of Shannon's mathematical theory of communication, and its implications for information science , 1977, J. Am. Soc. Inf. Sci..

[24]  David Cooper,et al.  Sorting of textual data bases: A variety generation approach to distribution sorting , 1980, Inf. Process. Manag..

[25]  Ronald E. Wyllys,et al.  Empirical and Theoretical Bases of Zipf's Law , 1981 .

[26]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[27]  Vijay V. Raghavan,et al.  Optimal Determination of User-Oriented Clusters: An Application for the Reproductive Plan , 1987, ICGA.