Semantic Similarity Computing Model Based on Multi Model Fine-Grained Nonlinear Fusion

Natural language processing (NLP) task has achieved excellent performance in many fields, including semantic understanding, automatic summarization, image recognition and so on. However, most of the neural network models for NLP extract the text in a fine-grained way, which is not conducive to grasp the meaning of the text from a global perspective. To alleviate the problem, the combination of the traditional statistical method and deep learning model as well as a novel model based on multi model nonlinear fusion are proposed in this paper. The model uses the Jaccard coefficient based on part of speech, Term Frequency-Inverse Document Frequency (TF-IDF) and word2vec-CNN algorithm to measure the similarity of sentences respectively. According to the calculation accuracy of each model, the normalized weight coefficient is obtained and the calculation results are compared. The weighted vector is input into the fully connected neural network to give the final classification results. As a result, the statistical sentence similarity evaluation algorithm reduces the granularity of feature extraction, so it can grasp the sentence features globally. Experimental results show that the matching of sentence similarity calculation method based on multi model nonlinear fusion is 84%, and the F1 value of the model is 75%.

[1]  S. R. Bhide,et al.  Optimum Coordination of Directional Overcurrent Relays Using the Hybrid GA-NLP Approach , 2011, IEEE Transactions on Power Delivery.

[2]  Hai Zhao,et al.  Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Abdalfattah M. Alfarra,et al.  Graph-Based Fuzzy Logic for Extractive Text Summarization (GFLES) , 2019, 2019 International Conference on Promising Electronic Technologies (ICPET).

[4]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[5]  Lei Zhang,et al.  Sentence representation and classification using attention and additional language information , 2018, 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI).

[6]  Huang Jian,et al.  Research on Sentence Similarity Calculation Based on Attention Mechanism and Sememe Information , 2019, 2019 IEEE International Conferences on Ubiquitous Computing & Communications (IUCC) and Data Science and Computational Intelligence (DSCI) and Smart Computing, Networking and Services (SmartCNS).

[7]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[8]  Meng Zhang,et al.  Listwise Ranking Functions for Statistical Machine Translation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Satoru Uchida,et al.  Automated Generation of Coding Rules: Text-Mining Approach to ISO 26000 , 2016, 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI).

[10]  Xiaolong Wang,et al.  Chemical-induced disease extraction via convolutional neural networks with attention , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[11]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[12]  E. A. Tikhonov,et al.  Specific appearing of stimulated Raman scattering in dyed multiple scattering media , 2013, 2013 IEEE 2nd International Workshop "Nonlinear Photonics" (NLP*2013).

[13]  Yang Yang,et al.  FGGAN: Feature-Guiding Generative Adversarial Networks for Text Generation , 2020, IEEE Access.

[14]  A. Basu,et al.  Discourse marker generation and syntactic aggregation in Bengali text generation , 2010, 2010 IEEE Students Technology Symposium (TechSym).

[15]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[16]  Wei Lee Woon,et al.  NLP-KAOS for Systems Goal Elicitation: Smart Metering System Case Study , 2014, IEEE Transactions on Software Engineering.

[17]  Martin Marinov,et al.  Representing Character Sequences as Sets : A simple and intuitive string encoding algorithm for NLP data cleaning , 2019, 2019 IEEE International Conference on Advanced Scientific Computing (ICASC).

[18]  Zhenyu Wang,et al.  Emotional Text Generation Based on Cross-Domain Sentiment Transfer , 2019, IEEE Access.

[19]  Ram Mohana Reddy Guddeti,et al.  Performance analysis of Ensemble methods on Twitter sentiment analysis using NLP techniques , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Zhou Quan,et al.  Combining Statistics-Based and CNN-Based Information for Sentence Classification , 2016, ICTAI.

[22]  Quang-Phuoc Nguyen,et al.  Effect of Word Sense Disambiguation on Neural Machine Translation: A Case Study in Korean , 2018, IEEE Access.

[23]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[24]  Rafael Ferreira Leite de Mello,et al.  Statistical and Semantic Features to Measure Sentence Similarity in Portuguese , 2017, 2017 Brazilian Conference on Intelligent Systems (BRACIS).

[25]  Yang Li,et al.  Multi-Channel CNN Based Inner-Attention for Compound Sentence Relation Classification , 2019, IEEE Access.

[26]  Rachana Oza,et al.  Review on Abstractive Text Summarization Techniques (ATST) for single and multi documents , 2018, 2018 International Conference on Computing, Power and Communication Technologies (GUCON).

[27]  A. Vannelli,et al.  Formulation of Oligopolistic Competition in AC Power Networks: An NLP Approach , 2007, IEEE Transactions on Power Systems.

[28]  Yijun Wang,et al.  Semi-Supervised Neural Machine Translation via Marginal Distribution Estimation , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Shuguo Li,et al.  A Low-Latency and Low-Cost Montgomery Modular Multiplier Based on NLP Multiplication , 2020, IEEE Transactions on Circuits and Systems II: Express Briefs.

[30]  Abdul Syukur,et al.  Review of automatic text summarization techniques & methods , 2020, J. King Saud Univ. Comput. Inf. Sci..

[31]  Maozhen Li,et al.  Disease Prediction and Early Intervention System Based on Symptom Similarity Analysis , 2019, IEEE Access.

[32]  Hong Peng,et al.  A New Feature Extraction Approach Based on Sentence Element Analysis , 2008, 2008 International Conference on Computational Intelligence and Security.

[33]  Hai Zhuge,et al.  Automatic Evaluation of Text Summarization Based on Semantic Link Network , 2019, 2019 15th International Conference on Semantics, Knowledge and Grids (SKG).

[34]  Junli Wang,et al.  Text Generation Service Model Based on Truth-Guided SeqGAN , 2020, IEEE Access.

[35]  Yuanbo Guo,et al.  A Self-Attention-Based Approach for Named Entity Recognition in Cybersecurity , 2019, 2019 15th International Conference on Computational Intelligence and Security (CIS).

[36]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[37]  Tinghui Li,et al.  Text Classification Research Based on Improved Word2vec and CNN , 2018, ICSOC Workshops.

[38]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.