A Neural Principal Component Analysis for text based documents keywords extraction

Information retrieval system users, such those operational on the web, usually use text modality to look not only for textual information but also for multimedia content. In order to satisfy the users requirement, information retrieval systems should have prepared a short representation of the content of each document composing the corpus, called index. This index doesn't, so often, reflect the intended meaning of the document they represent. In this paper, we propose an approach based on a Neural Principal Component Analysis that express the maximum variance of data and extract the principal component from it, by calculating the correlation between words of each document, to determine the keywords that give out the fields of intrest of each document content.