HARD: SUBJECT-BASED SEARCH ENGINE MENGGUNAKAN TF-IDF DAN JACCARDS COEFFICIENT

This paper proposes a hybridized concept of search engine based on subject parameter of High Accuracy Retrieval from Documents (HARD). Tf-Idf and Jaccard's Coefficient are modified and extended to providing the concept. Several illustrative examples are given including their steps of calculations in order to clearly understand the proposed concept and formulas. Abstract in Bahasa Indonesia : Paper ini memperkenalkan suatu algorima search engine berdasarkan konsep HARD (High Accuracy Retrieval from Documents) dengan menggabungkan penggunaan metoda TF-IDF (Term Frequency Inverse Document Frequency) dan Jaccard's Coefficient. Kedua metoda, TF-IDF dan Jaccard's Coefficient dimodifikasi dan dikembangkan dengan memperkenalkan beberapa rumusan baru. Untuk lebih memudahkan dalam mengerti algoritma dan rumusan baru yang diperkenalkan, beberapa contoh perhitungan diberikan. Kata kunci: HARD, Tf-Idf, koefisien Jaccard, search engine, himpunan fuzzy.

[1]  R. Intan Rarity-based similarity relations in a generalized fuzzy information system , 2004, IEEE Conference on Cybernetics and Intelligent Systems, 2004..

[2]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[3]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[4]  Masao Mukaidono,et al.  Toward a Fuzzy Thesaurus Based on Similarity in Fuzzy Covering , 2004, Aust. J. Intell. Inf. Process. Syst..

[5]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[6]  Masao Mukaidono,et al.  Fuzzy Conditional Probability Relations and their Applications in Fuzzy Information Systems , 2004, Knowledge and Information Systems.