Identifying Document Metadata Based on Multilayer Clustering

This paper presents a novel technique to semi-automatically identify metadata for documents when installing a knowledge management system. Document management systems often deal with large collections of documents. This vast amount of information needs to be searchable for the knowledge worker. Supporting techniques are needed to aid the knowledge worker in his search for information. Many of these techniques are based on the presence of metadata for each document. The techniques presented in this paper are based on a novel approach called multilayer clustering. Using this clustering technique, documents can be assigned to one or more document types. Besides this assignment to a specific type, properties and values are assigned to this document based on term networks extracted from this document. The preliminary tests presented in this paper were performed on a public and several private dataset. The results obtained from the tests indicate that this approach is promising.