Document Style Recognition Using Shallow Statistical Analysis

Documents differ not only in topic but also in style. Style is a very broad and ambiguous term used in arts, fashion, literary criticism, and linguistics. In case of text documents we can accept an intuitive understanding that style is mainly related to the form (how) whereas topic – to the content (what) of a document. Although some topics determine strictly the style can be used, most topics allow their expression in various styles. Thus, style can be considered to be orthogonal to topic in a certain sense. Therefore style can be assumed to be a useful parameter in many text processing and information retrieval tasks.