Smart Information Retrieval: Domain Knowledge Centric Optimization Approach

In the age of the Internet of Things, online data have witnessed a significant growth in terms of volume and diversity, and research into information retrieval has become one of the important research themes in the Internet-oriented data science research. This paper introduces a novel domain knowledge centric methodology aimed at improving the accuracy of using machine learning methods for relation extraction from text data, which is critical to the accuracy and efficiency of information retrieval-based applications, including recommender systems and sentiment analysis. The proposed methodology makes a significant contribution to the processes of domain knowledge-based relation extraction including interrogating Linked Open Datasets to generate the relation classification training data, addressing the imbalanced classification in the training datasets, determining the probability threshold of the best learning algorithm, and establishing the optimum parameters for genetic algorithms, which were utilized to optimize the feature selection for the learning algorithms. The experimental evaluation of the proposed methodology reveals that the adopted machine-learning algorithms exhibit higher precision and recall in relation extraction in the reduced feature space optimized by our implementation. The considered machine learning includes support vector machine, perceptron algorithm uneven margin, and K-nearest neighbors. The outcome is verified by comparing against the random mutation hill-climbing optimization algorithm using Wilcoxon signed-rank statistical analysis.

[1]  Chin-Teng Lin,et al.  Internet of Vehicles: Motivation, Layered Architecture, Network Model, Challenges, and Future Aspects , 2016, IEEE Access.

[2]  Pierre Zweigenbaum,et al.  Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification , 2011, J. Am. Medical Informatics Assoc..

[3]  Asif Ekbal,et al.  Feature Subset Selection Using Genetic Algorithm for Named Entity Recognition , 2010, PACLIC.

[4]  Kalina Bontcheva,et al.  Perceptron Learning for Chinese Word Segmentation , 2005, SIGHAN@IJCNLP 2005.

[5]  Khaled Nagi,et al.  ArabRelat: Arabic Relation Extraction using Distant Supervision , 2015, KEOD.

[6]  Taha Osman,et al.  Domain-specific Relation Extraction: using distant supervision Machine Learning , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[7]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[8]  John Shawe-Taylor,et al.  The Perceptron Algorithm with Uneven Margins , 2002, ICML.

[9]  Kalina Bontcheva,et al.  Adapting SVM for data sparseness and imbalance: a case study in information extraction , 2009, Natural Language Engineering.

[10]  Adam Prügel-Bennett,et al.  SVM Parameter Optimization using Grid Search and Genetic Algorithm to Improve Classification Performance , 2016 .

[11]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[12]  Jakub Piskorski,et al.  Information Extraction: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[13]  John H. Holland,et al.  When will a Genetic Algorithm Outperform Hill Climbing , 1993, NIPS.

[14]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[15]  Abdul Hanan Abdullah,et al.  Virtualization in Wireless Sensor Networks: Fault Tolerant Embedding for Internet of Things , 2018, IEEE Internet of Things Journal.

[16]  Omprakash Kaiwartya,et al.  Mobile Edge Computing for Big-Data-Enabled Electric Vehicle Charging , 2018, IEEE Communications Magazine.

[17]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[18]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[19]  A. Kai Qin,et al.  A review of population initialization techniques for evolutionary algorithms , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[20]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[21]  José Ramón Quevedo,et al.  Multilabel classifiers with a probabilistic thresholding strategy , 2012, Pattern Recognit..

[22]  Eugene Semenkin,et al.  Comparison of Two-Criterion Evolutionary Filtering Techniques in Cardiovascular Predictive Modelling , 2016, ICINCO.

[23]  Witold Pedrycz,et al.  A Competent Memetic Algorithm for Learning Fuzzy Cognitive Maps , 2015, IEEE Transactions on Fuzzy Systems.

[24]  Mengjie Zhang,et al.  A Comprehensive Comparison on Evolutionary Feature Selection Approaches to Classification , 2015, Int. J. Comput. Intell. Appl..

[25]  Omprakash Kaiwartya,et al.  Towards green computing for Internet of things: Energy oriented path and message scheduling approach , 2018 .

[26]  Rick Archibald,et al.  Feature Selection and Classification of Hyperspectral Images With Support Vector Machines , 2007, IEEE Geoscience and Remote Sensing Letters.

[27]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[28]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Evolutionary tuning of SVM parameter values in multiclass problems , 2008, Neurocomputing.

[29]  Andrew Stranieri,et al.  A genetic algorithm-neural network wrapper approach for bundle branch block detection , 2016, 2016 Computing in Cardiology Conference (CinC).

[30]  A. R. Baig,et al.  Multi-Objective Feature Subset Selection using Non-dominated Sorting Genetic Algorithm , 2015 .

[31]  H. Cunningham,et al.  Developing Language Processing Components with GATE , 2001 .

[32]  Ping Zhong,et al.  A new one-class SVM based on hidden information , 2014, Knowl. Based Syst..

[33]  Ting Wang,et al.  Automatic Extraction of Hierarchical Relations from Text , 2006, ESWC.

[34]  Maxim Buzdalov,et al.  Fast Implementation of the Steady-State NSGA-II Algorithm for Two Dimensions Based on Incremental Non-Dominated Sorting , 2015, GECCO.

[35]  Natalia Konstantinova,et al.  Review of Relation Extraction Methods: What Is New Out There? , 2014, AIST.

[36]  Herna L. Viktor,et al.  SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[37]  Fatos Xhafa,et al.  A comparison study of Hill Climbing, Simulated Annealing and Genetic Algorithm for node placement problem in WMNs , 2014, J. High Speed Networks.

[38]  Ismail Hmeidi,et al.  Performance of KNN and SVM classifiers on full word Arabic articles , 2008, Adv. Eng. Informatics.

[39]  R RemyaK,et al.  A Survey of Machine Learning Approaches for Relation Classification from Biomedical Texts , 2014 .

[40]  Dan Roth,et al.  Machine Learning with World Knowledge: The Position and Survey , 2017, ArXiv.

[41]  Kalina Bontcheva,et al.  SVM Based Learning System for Information Extraction , 2004, Deterministic and Statistical Methods in Machine Learning.

[42]  Gretchen G. Moisen,et al.  A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and Kappa , 2008 .

[43]  Jonathan Timmis,et al.  An experimental comparison of a genetic algorithm and a hill-climber for term selection , 2010, J. Documentation.

[44]  S. Imandoust,et al.  Application of K-Nearest Neighbor (KNN) Approach for Predicting Economic Events: Theoretical Background , 2013 .

[45]  Mufti Mahmud,et al.  Toward a Heterogeneous Mist, Fog, and Cloud-Based Framework for the Internet of Healthcare Things , 2019, IEEE Internet of Things Journal.

[46]  Tripti Swarnkar,et al.  Filter versus Wrapper Feature Subset Selection in Large Dimensionality Micro array : A Review , 2011 .