System of Semantic Integration of Non-Structuralized Documents in Natural Language in the Domain of Metallurgy

This paper presents assumptions for a system of automatic cataloging and semantic text documents searching. As an example, a document repository for metals processing technology was used. The system by using ontological model provides the user with a new approach to the exploration of database resources – easier and more intuitive information search. In the current document storage systems, searching is often based only on keywords and descriptions created manually by the system administrator. The use of text mining methods, especially latent semantic indexing, allows automatic clustering of documents with respect to their content. The result of this clustering is integrated with the ontological model, making navigation through documents resources intuitive and does not require the manual creation of directories. Such an approach seems to be particularly useful in a situation where we are dealing with large repositories of unstructured documents from such sources as the Internet. This situation is very typical for cases of searching information and knowledge in the area of metallurgy, for example with regard to innovation and non-traditional suppliers of materials and equipment.