Terminology extraction in the field of water environment based on rules and statistics

The acquisition of terminology in the field of water environment is the key to constructing the ontology of related fields, and it is also an important part of information extraction and information retrieval. This paper proposes algorithms based on rules and statistics to extract water environmental terms. Firstly, use the N-gram algorithm to segment the pre-processed text. Secondly, use relevant rules to filter the vocabulary. And then, use improved mutual information and adjacency entropy to filter the vocabulary to obtain candidate term words. Finally, use TFIDF to select terms related to the field of water environment. Experiments show that this method has achieved good results in extracting terminology in the field of water environment.