Using Latent Semantic Indexing for Literature Based Discovery

Latent semantic indexing (LSI) is a statistical technique As described by Swanson, there are two basic literature for improving information retrieval effectiveness. Here, discovery processes. The first leads from the literature we use LSI to assist in literature-based discoveries. The (R) associated with an initial topic to the literatures (I) idea behind literature-based discoveries is that different of one or more related, intermediate topics. The second authors have already published certain underlying scienleads from one of these related topics to the literature tific ideas that, when taken together, can be connected to hypothesize a new discovery, and that these connec(PD) associated with a potential discovery. Figure 1 illustions can be made by exploring the scientific literature. trates these two steps (left to right) . We explore latent semantic indexing’s effectiveness on We call these two processes identifying intermediate two discovery processes: uncovering ‘‘nearby’’ relationliteratures and identifying potential discovery literatures, ships that are necessary to initiate the literature based respectively (Fig. 1) . Our interest is learning if latent discovery process; and discovering more distant relationships that may genuinely generate new discovery semantic indexing (Deerwester et al., 1990), a statistical hypotheses. technique used with success in information retrieval, can help with either or both of these processes.