Discovering of Personal Name Prefix Patterns in Thai Researcher Corpus and Its Application

In the context of information extraction, a person’s name is one of the important named entities to be extracted which are applied to the question-answering and summarizing tasks. However, the boundary of a person’s name is still ambiguous since there are several writing patterns of a person’s name from online public data sources such as news, events, and researcher corpora. To extract, identify, and unify the person’s name, discovering the name prefix can be applied as clue words or phrases to such processes. In this paper, the name prefix discovering framework is proposed for collecting the integrated researcher corpus from various data sources and extracting name prefix patterns. Four main functions of the proposed framework are collecting data from data sources, tagging entities, preprocessing the researcher’s names, and finding the pattern of the personal name prefix. In this work, six data sources are gathered and ten entities related to the research domain are focused. The preprocessing data uses three sub-processes to provide the researcher’s name. The result shows that the 408 personal name prefixes are extracted. Moreover, the API development for extracting a person or researcher’s name is implemented using a Flask Python framework. The output of this work can be used to support the researcher’s name identification from the integrated researcher corpus.