论文信息 - LANGUAGE INDEPENDENT NAMED ENTITY RECOGNITION

LANGUAGE INDEPENDENT NAMED ENTITY RECOGNITION

The role of Internet in personal, economic and political advancement is growing in a fast pace. By the turn of century, data on web reaches to petabytes or exabytes or may even scale up-to unimaginable quantities. Extraction of precise and structured information from such large amounts of unstructured or semi-structured data is the major concern of web known as Information Extraction. Named entity recognition (NER) (also known as entity identification and entity extraction) is one of the important subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, monetary values, percentages, expressions of times, etc. NER has many applications in NLP, for e.g., in data classification, question answering, cross language information access, machine translation system, query processing, etc. Recognizing Named Entities (NEs) in English has reached accuracies nearing to 98%. For English, many cues aid to know the structure of language (one such important cue in identifying NEs is capitalization) which made the accuracies to be high. Whereas in Indian languages, there are no such cues available and moreover each Indian language differ from the other in grammatical structure. Hence, developing a language independent NER is a challenging task. Previous works includes developing an NER system using language dependent tools such as POS Tagger, dictionaries, Chunk Tagger, gazetteer lists, etc., or they have used linguistic experts to manually tag the training and testing data or linguistic experts used to generate rules for recognizing NEs. Language Independent approaches include supervised machine learning techniques such as CRF, HMM, MEMM, SVM, etc. These techniques need High amounts of manually tagged data which is again a point of concern. Some of the other approaches include exploiting the external knowledge such as Wikipedia. But, in those methods the utilization of Wikipedia is not complete. Hence, the main objective of this work is to build a language independent NER system without any manual intervention and without any usage of language dependent tools. The approach specified throughout the work, includes language independent methods to identify, extract and recognize the NEs. Identification of NEs is done using an External Knowledge namely

Vasudeva Varma | Vasudeva Varma | A. Lakshmi | Anantha Lakshmi

[1] Max Mühlhäuser,et al. Analyzing and accessing Wikipedia as a lexical semantic resource , 2007 .

[2] Jian Su,et al. Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[3] Li Zhang,et al. Focused named entity recognition using machine learning , 2004, SIGIR '04.

[4] Rahul Sharnagat,et al. Named Entity Recognition: a Literature Survey , 2022 .

[5] Timothy Weale. Utilizing Wikipedia Categories for Document Classification , 2006 .

[6] Patrick Schone,et al. Mining Wiki Resources for Multilingual Named Entity Recognition , 2008, ACL.