论文信息 - Leveraging Unstructured Information Using Topic Modelling

Leveraging Unstructured Information Using Topic Modelling

-Unstructured information in the form of natural language text is abundant in various kinds of organisations. To increase information sharing, organisational learning, decisionmaking and productivity, large amounts of unstructured text need to be analysed on a daily basis. Full text searching alone is not sufficient as a first approach to help users understand what a collection of electronic documents is about, since it does not provide the user with an overview of the underlying concepts in the document collection. A topic model is a useful mechanism for identifying and characterising various concepts embedded in a document collection allowing the user to navigate the collection in a topicguided manner. Topics, made up of significant words, provide the user with an overview of the content of the document collection. Each document is represented as a mixture of automatically constructed topics and the user may select documents related to a specific topic of interest and vice versa. Similarities between documents may be found by looking at what documents are assigned to a specific topic enabling the user to find other documents related to a given document. This methodology enables users to digest a larger number of documents, assisting them in spending more of their time in actually reading than finding relevant information.

N. D. du Preez | J. W. Uys | N. D. Preez | E. W. Uys | E. Uys