Information mining and similarity computation for semi- / un-structured sentences from the social data

Abstract In recent years, with the development of the social Internet of Things (IoT), all kinds of data accumulated on the network. These data, which contain a lot of social information and opinions. However, these data are rarely fully analyzed, which is a major obstacle to the intelligent development of the social IoT. In this paper, we propose a sentence similarity analysis model to analyze the similarity in people’s opinions on the hot topics in social media and news pages. Most of these data are unstructured or semi-structured sentences, so the accuracy of sentence similarity analysis largely determines the model’s performance. For the purpose of improving accuracy, we propose a novel method of sentence similarity computation to extract the syntactic and semantic information of the semi-structured and unstructured sentences. We mainly consider the subjects, predicates and objects of sentence pairs, and use Stanford Parser to classify the dependency relation triples to calculate the syntactic and semantic similarity between two sentences. Finally, we verify the performance of the model with the Microsoft Research Paraphrase Corpus (MRPC), which consists of 4076 pairs of training sentences and 1725 pairs of test sentences, and most of the data came from the news of social data. Extensive simulations demonstrate that our method outperforms other state-of-the-art methods regarding the correlation coefficient and the mean deviation.

[1]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[2]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[3]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[4]  Xiong Jing,et al.  Dependency Syntactic Tree Supported Sentence Similarity Computing , 2013 .

[5]  Zuhair Bandar,et al.  A Method for Measuring Sentence Similarity and iIts Application to Conversational Agents , 2004, FLAIRS.

[6]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[7]  Q. Lu,et al.  Software defect prediction using fuzzy integral fusion based on GA-FM , 2014, Wuhan University Journal of Natural Sciences.

[8]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[9]  Ramiz M. Aliguliyev,et al.  A new sentence similarity measure and sentence based extractive technique for automatic text summarization , 2009, Expert Syst. Appl..

[10]  M. Dolores del Castillo,et al.  SyMSS: A syntax-based measure for short-text semantic similarity , 2011, Data Knowl. Eng..

[11]  Iraklis Varlamis,et al.  Text Relatedness Based on a Word Thesaurus , 2010, J. Artif. Intell. Res..

[12]  Yacine Challal,et al.  A roadmap for security challenges in the Internet of Things , 2017, Digit. Commun. Networks.

[13]  Qinghua Lu,et al.  An Improved SMOTE Imbalanced Data Classification Method Based on Support Degree , 2014, 2014 International Conference on Identification, Information and Knowledge in the Internet of Things.

[14]  Simon Coupland,et al.  A fast and efficient semantic short text similarity metric , 2013, 2013 13th UK Workshop on Computational Intelligence (UKCI).

[15]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[16]  Kim-Kwang Raymond Choo,et al.  A blockchain future for internet of things security: a position paper , 2017, Digit. Commun. Networks.

[17]  Jiaqi Zheng,et al.  Toward optimal participant decisions with voting-based incentive model for crowd sensing , 2020, Inf. Sci..

[18]  Haipeng Yao,et al.  A novel sentence similarity model with word embedding based on convolutional neural network , 2018, Concurr. Comput. Pract. Exp..

[19]  Bin Yao,et al.  ACV-tree: A New Method for Sentence Similarity Modeling , 2018, IJCAI.

[20]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[21]  Shiwen Mao,et al.  RFHUI: an RFID based human-unmanned aerial vehicle interaction system in an indoor environment , 2020, Digit. Commun. Networks.

[22]  Peiying Zhang,et al.  Chinese Sentence Similarity Computational Model Based on Multi-Features Combination , 2016 .

[23]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[24]  Jimmy J. Lin,et al.  Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks , 2015, EMNLP.

[25]  Samir Amir,et al.  Sentence similarity based on semantic kernels for intelligent text retrieval , 2016, Journal of Intelligent Information Systems.

[26]  Qi Wang,et al.  An intelligent task offloading algorithm (iTOA) for UAV edge computing network , 2020, Digit. Commun. Networks.

[27]  Jerzy Surma,et al.  Data Mining in On-Line Social Network for Marketing Response Analysis , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[28]  Ming Che Lee,et al.  A novel sentence similarity measure for semantic-based expert systems , 2011, Expert Syst. Appl..

[29]  Kambale Vanty Muhongya,et al.  Visualising and analysing online social networks , 2015, 2015 International Conference on Computing, Communication and Security (ICCCS).

[30]  Bojan Furlan,et al.  Comparable Evaluation of Contemporary Corpus-Based and Knowledge-Based Semantic Similarity Measures of Short Texts , 2011 .

[31]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[32]  Weishan Zhang,et al.  Food Image Recognition with Convolutional Neural Networks , 2015, 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).

[33]  Zuhair Bandar,et al.  A Comparative Study of Two Short Text Semantic Similarity Measures , 2008, KES-AMSTA.

[34]  Jiaqi Zheng,et al.  MAN: Mutual Attention Neural Networks Model for Aspect-Level Sentiment Classification in SIoT , 2020, IEEE Internet of Things Journal.

[35]  Weishan Zhang,et al.  Emotion Recognition in Speech Using Multi-classification SVM , 2015, 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).