Applying Latent Semantic Indexing in Frequent Itemset Mining for Document Relation Discovery

Word-based relations among technical documents are immensely useful information but often hidden in a large amount of scientific publications. This work presents a method to apply latent semantic indexing in frequent itemset mining to discover potential relations among scientific publications. In this work, two weighting schemes, tf and tfidf are investigated with the exploitation of latent semantic indexing. The proposed method is evaluated using a set of technical documents in a publication database by comparing the extracted document relations with their references (citations). To this end, the paper uses order accumulative citation matrices to evaluate the validity (quality) of discovered patterns. The results also show that the proposed method successfully discovers a set of document relations, comparing to the original method that uses no latent semantic indexing.