The software environment for multi-aspect study of the lexical characteristics of the text is considered. The proposed environment provides tools and features allowing automatically building a dictionary based on a text corpus of interest. The created toolkit focused on lexical units acting as markers and indicators of higher level objects. The considered environment allows solving various text analysis tasks; because it integrates various tools for conducting language research and supports customization of vocabularies to a problem area. This toolkit includes interfaces for developing vocabularies and a system of features. To study the contexts of the use of terms, concordance construction tools are provided. Concordances allow the researcher to test his or her hypothesis about the functionality of a particular lexical unit. To describe more complex constructions to be extracted, a user can apply search patterns, supported by a user-friendly language. Using these patterns allows us to develop lexicographic resources containing not only the traditional vocabularies and stable inseparable lexical phrases, but also language constructs that have a more complex structure.
[1]
Michael Nokel,et al.
Topic Models Can Improve Domain Term Extraction
,
2013,
ECIR.
[2]
Терминов Для,et al.
TERM EXTRACTION FOR CONSTRUCTING SUBJECT INDEX OF EDUCATIONAL SCIENTIFIC TEXT
,
2018
.
[3]
Sidorova Elena,et al.
A LEXICO-SEMANTIC TEMPLATES AS A TOOL FOR DECLARATIVE DESCRIPTION LANGUAGE CONSTRUCTS LINGUISTIC TEXT ANALYSIS
,
2018
.
[4]
John Sinclair,et al.
Corpus, Concordance, Collocation
,
1991
.
[5]
Irina Kononenko,et al.
Подход к фильтрации запрещенного контента в веб-пространстве (An Approach to Filtering Prohibited Content on the Web)
,
2017,
DAMDID/RCDL.