Representing interests as a hyperlinked document collection

We describe a latent variable model for representing a user's interests as a hyperlinked document collection. By collecting hyper-text documents that a user views, creates or updates whilst at their computer, we are able to use not only the content of these documents but also the inter-connectivity of the collection to model the user's interests. The model uses Probabilistic Latent Semantic Analysis and Probabilistic Hypertext Induced Topic Selection and decomposes the user's document collection into a set of factors each of which represents a user's interest. This model can be used to personalise information access tasks such as a personalised search engine or a personalised news service. Our latent variable model's performance is compared with that of a more conventional vector space clustering algorithm.