Some keyword-based characteristics for evaluation of thematic structure of multidisciplinary documen

The problem of classification of documents of complex interdisciplinary character with high level of informational noise is considered. The set of classification domains is supposed to be fixed. A domain is defined by an appropriate keyword list. Quantitative and qualitative characteristics, as well as visual presentations used for such classification are discussed. A program Text Recognizer based on these characteristics is presented.

[1]  Adolfo Guzman Finding the main themes in a spanish document , 1998 .

[2]  Adolfo Guzmán-Arenas,et al.  A method of describing document contents through topic selection , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[3]  Ido Dagan,et al.  Knowledge Discovery in Textual Databases (KDT) , 1995, KDD.