Probabilistic logic with minimum perplexity: Application to language modeling

Any statistical model based on training encounters sparse configurations. These data are those that have not been encountered (or seen) during the training phase. This inherent problem is a big challenge to many scientific communities. The statistical estimation of rare events is usually performed through the maximum likelihood (ML) criterion. However, it is well-known that the ML estimator is sensitive to extreme values that is therefore non-reliable. To answer this challenge, we propose a novel approach based on probabilistic logic (PL) and the minimal perplexity criterion. In our approach, configurations are considered as probabilistic events such as predicates related through logical connectors. Our method was applied to estimate word trigram probability values from a corpus. Experimental results conducted on several test sets show that the PL method with minimal perplexity has outperformed both the ''Absolute Discounting'', and the ''Good-Turing Discounting'' techniques.

[1]  A. Nadas,et al.  Estimation of probabilities in the language model of the IBM speech recognition system , 1984 .

[2]  Djamel Bouchaffra,et al.  Postprocessing of Recognized Strings Using Nonstationary Markovian Models , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[5]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[6]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[7]  D. Bouchaffra Theory and algorithms for analysing the consistent region in probabilistic logic , 1993 .

[8]  Nils J. Nilsson,et al.  Probabilistic Logic * , 2022 .

[9]  Ido Dagan,et al.  Similarity-Based Estimation of Word Cooccurrence Probabilities , 1994, ACL.

[10]  Frankie James,et al.  Modified Kneser-Ney Smoothing of n-gram Models , 2000 .

[11]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[12]  Ronald Rosenfeld,et al.  Improving trigram language modeling with the World Wide Web , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[13]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[15]  Djamel Bouchaffra,et al.  Incorporating diverse information sources in handwriting recognition postprocessing , 1996 .

[16]  Ronald Rosenfeld,et al.  A survey of smoothing techniques for ME models , 2000, IEEE Trans. Speech Audio Process..

[17]  John Lafferty,et al.  Grammatical Trigrams: A Probabilistic Model of Link Grammar , 1992 .

[18]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[19]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[20]  Sadaoki Furui,et al.  Advances in Speech Signal Processing , 1991 .

[21]  Peng Xu,et al.  Random Forests in Language Modelin , 2004, EMNLP.

[22]  Kenneth Ward Church,et al.  Enhanced Good-Turing and Cat-Cal: Two New Methods for Estimating Probabilities of English Bigrams (abbreviated version) , 1989, HLT.

[23]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[24]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[25]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[26]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[27]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[28]  H. WittenI.,et al.  The zero-frequency problem , 2006 .

[29]  J. Dieudonne Foundations of Modern Analysis , 1969 .