Sparse matrix reordering schemes for browsing hypertext

Many approaches for retrieving documents from electronic databases depend on the literal matching of words in user’s query to the keywords defining database objects. Since there is great diversity in the words people use to describe the same object, literalor lexicalbased methods can often retrieve irrelevant documents. Another approach to exploit the implicit higher-order structure in the association of terms with text objects is to compute the singular value decomposition (SVD) of large sparse term by text-object matrices. Latent Semantic Indexing (LSI) is a conceptual indexing method which employs the SVD to represent terms and objects by dominant singular subspaces ST that user queries can be matched in a iower-rank semantic space. This paper considers a third, intermediate approach to facilitate the immediate d+= ‘:Lon ,of aocu.ment (or term) ciuscers. We demo~strate both traditiena! sparse r-. arrix reordering schemes (e. g., Reverse Cuthill-McKee) and spectral-based a~proaches (e.g., Correspondence Analysis or Fiedler vector-based spectral bisxtion) that can be used to permute original term by document (hypertext) .=. atrices to a narrow-banded form suitable for the detection of document (or Ierm] clusters. Although thk approach would not exploit the higher-order semantic structure in the database, it can be used to develo<p browsing tools for E>-pertext and on-line information at a reduced computational cost.