Automatic Learning of Language Model Structure

Statistical language modeling remains a challenging task, in particular for morphologically rich languages. Recently, new approaches based on factored language models have been developed to address this problem. These models provide principled ways of including additional conditioning variables other than the preceding words, such as morphological or syntactic features. However, the number of possible choices for model parameters creates a large space of models that cannot be searched exhaustively. This paper presents an entirely data-driven model selection procedure based on genetic search, which is shown to outperform both knowledge-based and random selection procedures on two different language modeling tasks (Arabic and Turkish).

[1]  K. Dejong,et al.  An Analysis Of The Behavior Of A Class Of Genetic Adaptive Systems , 1975 .

[2]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[3]  Kenneth Alan De Jong,et al.  An analysis of the behavior of a class of genetic adaptive systems. , 1975 .

[4]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[5]  R. Schroeder LITERATURE SURVEY , 1981 .

[6]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[7]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[8]  James P. Cohoon,et al.  Genetic Placement , 1987, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[10]  Geoffrey E. Hinton,et al.  How Learning Can Guide Evolution , 1996, Complex Syst..

[11]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[12]  Kalyanmoy Deb,et al.  Messy Genetic Algorithms: Motivation, Analysis, and First Results , 1989, Complex Syst..

[13]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[14]  Hiroaki Kitano,et al.  Designing Neural Networks Using Genetic Algorithms with Graph Generation System , 1990, Complex Syst..

[15]  W. Daniel Hillis,et al.  Co-evolving parasites improve simulated evolution as an optimization procedure , 1990 .

[16]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[17]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[18]  Frédéric Gruau,et al.  Genetic synthesis of Boolean neural networks with a cell rewriting developmental process , 1992, [Proceedings] COGANN-92: International Workshop on Combinations of Genetic Algorithms and Neural Networks.

[19]  R. Belew Interposing an ontogenic model between Genetic Algorithms and Neural Networks , 1992 .

[20]  Richard K. Belew,et al.  Interposing an Ontogenetic Model Between Genetic Algorithms and Neural Networks , 1992, NIPS.

[21]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[22]  Peter Ross,et al.  A Promising Genetic Algorithm Approach to Job-Shop SchedulingRe-Schedulingand Open-Shop Scheduling Problems , 1993, ICGA.

[23]  Steffen Schulze-Kremer,et al.  Genetic Algorithms for Protein Tertiary Structure Prediction , 1993, ECML.

[24]  Alan S. Perelson,et al.  Using Genetic Algorithms to Explore Pattern Recognition in the Immune System , 1993, Evolutionary Computation.

[25]  M. Lankhorst Breeding Grammars: Grammatical Inference with a Genetic Algorithm , 1994 .

[26]  S. Forrest,et al.  Modeling Complex Adaptive Systems with Echo , 1994 .

[27]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[28]  Petra Geutner,et al.  Using morphology towards better large-vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[29]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[30]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[31]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[32]  Ronald Rosenfeld,et al.  Lattice based language models , 1997 .

[33]  Anja Belz Discovering Phonotactic Finite-State Automata by Generic Search , 1998, COLING-ACL.

[34]  Kemal Oflazer Dependency Parsing with an Extended Finite-State Approach , 2003, Computational Linguistics.

[35]  Tanja Schultz,et al.  Data-Driven Determination of Appropriate Dictionary Units for Korean LVCSR , 1999 .

[36]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[37]  Philip C. Woodland,et al.  Particle-based language modelling , 2000, INTERSPEECH.

[38]  Tanja Schultz,et al.  Turkish LVCSR: towards better speech recognition for agglutinative languages , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[39]  Gökhan Tür,et al.  Statistical Morphological Disambiguation for Agglutinative Languages , 2000, COLING.

[40]  William J. Byrne,et al.  On large vocabulary continuous speech recognition of highly inflectional language - czech , 2001, INTERSPEECH.

[41]  Daniel Joseph Chair-Morgan Nelson Gildea,et al.  Statistical language understanding using frame semantics , 2001 .

[42]  William A. Gale,et al.  Good-Turing Smoothing Without Tears , 2001 .

[43]  Robert Axelrod,et al.  The Evolution of Strategies in the Iterated Prisoner's Dilemma , 2001 .

[44]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[45]  Kim-Fung Man,et al.  A genetic classification error method for speech recognition , 2002, Signal Process..

[46]  Wei Wang,et al.  Factorization of Language Models through Backing-Off Lattices , 2003, ArXiv.

[47]  Chin-Hui Lee,et al.  Hierarchical class n-gram language models: towards better estimation of unseen events in speech recognition , 2003, INTERSPEECH.

[48]  Katrin Kirchhoff,et al.  Multi-stream language identification using data-driven dependency selection , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[49]  Bernard Yannou,et al.  Optimization of the keyboard arrangement problem using an Ant Colony algorithm , 2003, Eur. J. Oper. Res..

[50]  Jeff A. Bilmes,et al.  Novel approaches to Arabic speech recognition: report from the 2002 Johns-Hopkins Summer Workshop , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[51]  Jeff A. Bilmes,et al.  Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[52]  Jeff A. Bilmes,et al.  Multi-Speaker Language Modeling , 2004, HLT-NAACL.