Citation Analysis and Keyword Mining based on Fulltext Extraction of Scientific Literature

Citation analysis as a meaningful research tool has been studied for a long time for domain information visualization, information retrieval, and bibliometric analysis. This paper proposes three steps of mining keyword relationships using citation graph analysis based on the fulltext of scientific literature in the scientific publication database. First, the method Citation Probability Distribution Distance (CPDD) was proposed to generate domain knowledge graphs based on domain and domain context. We then introduce three rules to merge them into a new graph to improve the performance. Secondly, we use a topic modeling method (Labeled LDA) to improve CPDD to avoid the significant hypothesis in the aforementioned method by analyzing the distribution of citation over keywords. In this way, we can find the topic distribution for each citation and establish the keyword relationship graph by citations. Last but not least, we will use optimized PageRank algorithm to evaluate the ranking results of the selected publications, not only taking into account the citation counts, but also considering the keyword relationships generated by citation analysis based on fulltext extraction.