Classification Tuberculosis DNA using LDA-SVM

Tuberculosis is a disease caused by the mycobacterium tuberculosis virus. Tuberculosis is very dangerous and it is included in the top 10 causes of the death in the world. In its detection, errors often occur because it is similar to other diffuse lungs. The challenge is how to better detect using DNA sequence data from mycobacterium tuberculosis. Therefore, preprocessing data is necessary. Preprocessing method is used for feature extraction, it is k-Mer which is then processed again with TF-IDF. The use of dimensional reduction is needed because the data is very large. The used method is LDA. The overall result of this study is the best k value is k = 4 based on the experiment. With performance evaluation accuracy = 0.927, precision = 0.930, recall = 0.927, F score = 0.924, and MCC = 0.875 which obtained from extraction using TF-IDF and dimension reduction using LDA.