Mining textual significant expressions reflecting opinions in natural languages

Revealing an opinion hidden in a text document is a challenging task. The article presents a method based on the automatic extraction of expressions that are significant for specifying a document attitude to a given topic. The significant expressions are composed using revealed significant words in the documents. The significant words are selected by the c5 decision-tree generator based on the entropy minimization. Words included in branches represent kernels of the significant expressions. The full expressions are composed of the significant words and words surrounding them in the original documents. Such expressions provide much more information than individual (key-)words and can be used for analysing a document meaning and the cause of the opinion: what exactly the opinion deals with? The results are demonstrated using large real-world multilingual data representing customers' opinions written in a free form.