Improving on the smoothing technique for obtaining emission probabilities in hidden Markov models

Hidden Markov Models (HMMs) have been shown to achieve good performance when applied to information extraction tasks. This paper describes the training aspect of exploring HMMs for the task of metadata extraction from tagged bibliographic references. The main contribution of this work is the improvement of the technique proposed by earlier researchers for smoothing emission probabilities in order to avoid the occurrence of zero values. The results show the effectiveness of the proposed method.

[1]  Larry Gillick,et al.  A hidden Markov model approach to text segmentation and event tracking , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Michael Kuperberg,et al.  Markov Models , 2017, Arch. Formal Proofs.

[3]  Andrew McCallum,et al.  Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.

[4]  Robert Dale,et al.  Evidence-Based Information Extraction for High Accuracy Citation and Author Name Identification , 2007, RIAO.

[5]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[6]  Kazem Taghva,et al.  Address extraction using hidden Markov models , 2005, IS&T/SPIE Electronic Imaging.

[7]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[8]  Edward A. Fox,et al.  Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[9]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[10]  Sunita Sarawagi,et al.  Automatic segmentation of text into structured records , 2001, SIGMOD '01.

[11]  Thomas M. Breuel,et al.  Bibliographic Meta-Data Extraction Using Probabilistic Finite State Transducers , 2007 .

[12]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[13]  Andrew McCallum,et al.  Information Extraction with HMMs and Shrinkage , 1999 .