Standardization of References using Hidden Markov Model

Standardization of R eferences using H idden M arkov M odel by Swamynathan Sam bam urthy Dr. Kazem Taghva, Examination Committee Chair Professor of Computer Science University of Nevada, Las Vegas In general, technical papers are augmented with a list of bibliographic citations to support the arguments and the merits o f the approach presented. Each and every citation is made up of parts like author. Journal, volume, book etc. Extracting the parts of the citation from a written document and properly separating into its parts is the problem that is being addressed in this thesis. We use an Information Extraction (IE) technique based on Hidden Markov Model (HM M ) to solve this problem. This solution consists of the design of an H M M , the training of the HM M with tagged data, and an implementation of Forward Chaining algorithm for extraction of citation parts. Our test on a collection of J50 citations has recall and precision of 0.8 and 0.81 respectively.

[1]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.