Learning Syntactic Rules and Tags with Genetic Algorithms for Information Retrieval and Filtering: An Empirical Basis for Grammatical Rules

Abstract The grammars of natural languages may be learned by using genetic algorithms that reproduce and mutate grammatical rules and part-of-speech tags, improving the quality of later generations of grammatical components. Syntactic rules are randomly generated and then evolve; those rules resulting in improved parsing and occasionally improved retrieval and filtering performance are allowed to further propagate. The LUST system learns the characteristics of the language or sublanguage used in document abstracts by learning from the document rankings obtained from the parsed abstracts. Unlike the application of traditional linguistic rules to retrieval and filtering applications, LUST develops grammatical structures and tags without the prior imposition of some common grammatical assumptions (e.g. part-of-speech assumptions), producing grammars that are empirically based and are optimized for this particular application.

[1]  Marc M. Lankhorst Grammatical Inference with a Genetic Algorithm , 1994, EUROSIM.

[2]  Robert Burgin,et al.  Improving Disambiguation in FASIT , 1992, J. Am. Soc. Inf. Sci..

[3]  Ralph Grishman,et al.  Statistical Parsing of Messages , 1990, HLT.

[4]  F. Newmeyer Linguistic Theory in America: Second Edition , 1986 .

[5]  Ralph Grishman,et al.  Analyzing language in restricted domains : sublanguage description and processing , 1986 .

[6]  Lalit M. Patnaik,et al.  Genetic algorithms: a survey , 1994, Computer.

[7]  Robert M. Losee,et al.  An analytic measure predicting information retrieval system performance , 1991, Inf. Process. Manag..

[8]  Susan Bonzi,et al.  Syntactic patterns in scientific sublanguages: A study of four disciplines , 1990, J. Am. Soc. Inf. Sci..

[9]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[10]  Stephanie W. Haas,et al.  Sublanguage Terms: Dictionaries, Usage, and Automatic Classification , 1995, J. Am. Soc. Inf. Sci..

[11]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[12]  Stephanie W. Haas,et al.  Looking in Text Windows: Their Size and Composition , 1994, Inf. Process. Manag..

[13]  Jing-Jye Yang Use of genetic algorithms for query improvement in information retrieval based on a vector space model , 1993 .

[14]  Fred J. Damerau Evaluating computer-generated domain-oriented vocabularies , 1990, Inf. Process. Manag..

[15]  Clement T. Yu,et al.  Probabilistic models for document retrieval: a comparison of perfromance on exterimental and synthetic data bases , 1986, SIGIR '86.

[16]  Stephanie W. Haas,et al.  Toward the Automatic Identification of Sublanguage Vocabulary , 1993, Inf. Process. Manag..

[17]  Brij Masand,et al.  Optimizing confidence of text classification by evolution of symbolic expressions , 1994 .

[18]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[19]  Ted Briscoe,et al.  Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars , 1993, CL.

[20]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[21]  Robert M. Losee Term Dependence: Truncating the Bahadur Lazarsfeld Expansion , 1994, Inf. Process. Manag..

[22]  Robin Clark,et al.  A Computational Model of Language Learnability and Language Change , 2018, Diachronic and Comparative Syntax.

[23]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[24]  Robert M. Losee,et al.  Determining Information Retrieval and Filtering Performance without Experimentation , 1995, Inf. Process. Manag..

[25]  Robert M. Losee,et al.  Parameter Estimation for Probabilistic Document-Retrieval Models. , 1988 .

[26]  Cesare Alippi,et al.  Genetic-algorithm programming environments , 1994, Computer.

[27]  Abraham Bookstein,et al.  Information retrieval: A sequential learning process , 1983, J. Am. Soc. Inf. Sci..

[28]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[29]  Victor H. Yngve Linguistics as a science , 1986 .

[30]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[31]  Stephanie W. Haas,et al.  Sublanguage terms: dictionaries, usage, and automatic classification , 1995 .

[32]  J. K. Kinnear,et al.  Advances in Genetic Programming , 1994 .

[33]  Susan Bonzi Syntactic Patterns in Scientific Sublanguages: A Study of Four Disciplines. , 1990 .

[34]  Frederick J. Newmeyer,et al.  Linguistic Theory In America , 1980 .

[35]  M. Lankhorst Breeding Grammars: Grammatical Inference with a Genetic Algorithm , 1994 .