LSA와 LDA를 이용한 풍공학회지 토픽모델링

This study aimed to compare and evaluate the suitability of topic modeling techniques such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) by applying to the research subject extraction of the Journal of the Wind Engineering Institute of Korea. In order to evaluate the similarity between classified topics, a method of correlation analysis using a documenttopic- matrix was proposed. LDA, which uses the probability of combinations of specific words, employed more than twice as many words to compose a topic than LSA, which extracts topics from the feature vectors of the document-word-matrix. As a result, the topics extracted by LDA were more independent than those extracted by LSA. In summarizing the research subjects of the journal, ‘building’ and ‘bridge’ were taken as the ‘research objective’ and investigated to determine ‘wind speed’, ‘wind load’, and ‘vibration control’, which constitute the ‘research purpose’, while ‘wind tunnel test’ or ‘numerical method’ were used as the ‘research method'. It is concluded that topic modeling should be improved in a way that reflects the use of words by defining the research subject as a structural combination of ‘research subject’, ‘research purpose’ and ‘research method’.