An ensemble topic extraction approach based on optimization clusters using hybrid multi-verse optimizer for scientific publications

For text document clustering (TDC), a novel hybrid of the multi-verse optimizer (MVO) algorithm and k-means (also called H-MVO) are proposed in this work. Moreover, a new ensemble method for an automatic topic extraction (TE) has been proposed in this paper, from a set of scientific publications in the form of text documents with the purpose of extracting topics from clustered documents. Often, the existing TE methods draw upon the statistical theory. However, the results might be different when the same clustered document is utilized. Consequently, there can be imprecise results, which are related to the extracted topics from the clustered documents owing to the behavior of the TE methods. As a result, the vigorous characteristics of the TE methods are ensembled, thereby empowering the accuracy of the extracted topics. The results, which were yielded by H-MVO for TDC, were compared against 14 well-regarded methods, involving five clustering methods, in addition to seven metaheuristic algorithms, as well as two hybrid optimization algorithms. Also, the results, which were generated by the introduced ensembled TE method, were compared against those, which were produced by five established statistical methods in the literature. As a result, the findings revealed that the suggested ensembled TE method outperformed the entire comparative methods, thereby utilizing all the external measurements for almost the entire datasets. Moreover, the new method can complement the advantages of the five previously proposed methods. Accordingly, more advanced results were obtained.

[1]  Ashraf Darwish,et al.  A new chaotic multi-verse optimization algorithm for solving engineering optimization problems , 2018, J. Exp. Theor. Artif. Intell..

[2]  Z. A. Shaikh,et al.  Keyword Detection Techniques: A Comprehensive Study , 2018 .

[3]  Ahamad Tajudin Abdul Khader,et al.  Link-based multi-verse optimizer for text documents clustering , 2020, Appl. Soft Comput..

[4]  Ahamad Tajudin Khader,et al.  An Improved Text Feature Selection for Clustering Using Binary Grey Wolf Optimizer , 2020 .

[5]  Mohammed Azmi Al-Betar,et al.  Data Clustering Using Harmony Search Algorithm , 2011, SEMCCO.

[6]  Mohammed Azmi Al-Betar,et al.  The effects of EEG feature extraction using multi-wavelet decomposition for mental tasks classification , 2019, ICICT '19.

[7]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[8]  Andrew Lewis,et al.  Grey Wolf Optimizer , 2014, Adv. Eng. Softw..

[9]  Aytug Onan,et al.  Ensemble of keyword extraction methods and classifiers in text classification , 2016, Expert Syst. Appl..

[10]  Dino Isa,et al.  Using unsupervised clustering approach to train the Support Vector Machine for text classification , 2016, Neurocomputing.

[11]  Youhei Akimoto,et al.  Application of optimal control theory based on the evolution strategy (CMA-ES) to automatic berthing , 2020, Journal of Marine Science and Technology.

[12]  Christos Bouras,et al.  A clustering technique for news articles using WordNet , 2012, Knowl. Based Syst..

[13]  Ali Emrouznejad,et al.  A survey and analysis of the first 40 years of scholarly literature in DEA: 1978–2016 , 2018 .

[14]  Van-Nam Huynh,et al.  A method for k-means-like clustering of categorical data , 2019, Journal of Ambient Intelligence and Humanized Computing.

[15]  N. P. Gopalan,et al.  RETRACTED ARTICLE: An improved key term weightage algorithm for text summarization using local context information and fuzzy graph sentence score , 2020, Journal of Ambient Intelligence and Humanized Computing.

[16]  Gaurav Dhiman,et al.  Spotted hyena optimizer: A novel bio-inspired based metaheuristic technique for engineering applications , 2017, Adv. Eng. Softw..

[17]  Rosni Abdullah,et al.  Earlier stage for straggler detection and handling using combined CPU test and LATE methodology , 2020 .

[18]  Wei Pan,et al.  An exponential function inflation size of multi-verse optimisation algorithm for global optimisation , 2017, Int. J. Comput. Sci. Math..

[19]  Hossam Faris,et al.  Multi-verse Optimizer: Theory, Literature Review, and Application in Data Clustering , 2019, Nature-Inspired Optimizers.

[20]  Michael Granitzer,et al.  Word Clouds for Efficient Document Labeling , 2011, Discovery Science.

[21]  R. H. Bhesdadiya,et al.  A novel hybrid Particle Swarm Optimizer with multi verse optimizer for global numerical optimization and Optimal Reactive Power Dispatch problem , 2017 .

[22]  Mohammed Azmi Al-Betar,et al.  Particle Swarm optimization Algorithm for Power Scheduling Problem Using Smart Battery , 2019, 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT).

[23]  Ashraf Darwish,et al.  Quantum multiverse optimization algorithm for optimization problems , 2017, Neural Computing and Applications.

[24]  Ricardo Campos,et al.  YAKE! Keyword extraction from single documents using multiple local features , 2020, Inf. Sci..

[25]  Rong Wang,et al.  Multi-view spectral clustering via integrating nonnegative embedding and spectral embedding , 2020, Inf. Fusion.

[26]  Nong Sang,et al.  Study on multi-center fuzzy C-means algorithm based on transitive closure and spectral clustering , 2014, Appl. Soft Comput..

[27]  Seyed Mohammad Mirjalili,et al.  Multi-Verse Optimizer: a nature-inspired algorithm for global optimization , 2015, Neural Computing and Applications.

[28]  Hossam Faris,et al.  Salp Swarm Algorithm: A bio-inspired optimizer for engineering design problems , 2017, Adv. Eng. Softw..

[29]  Bing Liu,et al.  Text sentiment analysis based on CBOW model and deep learning in big data environment , 2018, J. Ambient Intell. Humaniz. Comput..

[30]  Pushpak Bhattacharyya,et al.  Automatic Scientific Document Clustering Using Self-organized Multi-objective Differential Evolution , 2018, Cognitive Computation.

[31]  Alan L. Porter,et al.  Topic analysis and forecasting for science, technology and innovation: Methodology with a case study focusing on big data research , 2016 .

[32]  S. S. Ravi,et al.  Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results , 2005, PKDD.

[33]  Mohammed Azmi Al-Betar,et al.  A Text Feature Selection Technique based on Binary Multi-Verse Optimizer for Text Clustering , 2019, 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT).

[34]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[35]  Seyedali Mirjalili,et al.  Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems , 2015, Neural Computing and Applications.

[36]  Vasudha Bhatnagar,et al.  sCAKE: Semantic Connectivity Aware Keyword Extraction , 2018, Inf. Sci..

[37]  Jong-Mo Seo,et al.  A news-topic recommender system based on keywords extraction , 2017, Multimedia Tools and Applications.

[38]  Zhi Zhou,et al.  Keyphrase Extraction Using Semantic Networks Structure Analysis , 2006, Sixth International Conference on Data Mining (ICDM'06).

[39]  Juliano Pierezan,et al.  Cultural coyote optimization algorithm applied to a heavy duty gas turbine operation , 2019, Energy Conversion and Management.

[40]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[41]  S. Meera,et al.  A hybrid metaheuristic approach for efficient feature selection methods in big data , 2020 .

[42]  Ahmed Fathy,et al.  Multi-Verse Optimizer for Identifying the Optimal Parameters of PEMFC Model , 2018 .

[43]  Rob Koopman,et al.  Mutual information based labelling and comparing clusters , 2017, Scientometrics.

[44]  Yaakov HaCohen-Kerner,et al.  Automatic Extraction and Learning of Keyphrases from Scientific Articles , 2005, CICLing.

[45]  Zhihua Cui,et al.  Hybrid many-objective particle swarm optimization algorithm for green coal production problem , 2020, Inf. Sci..

[46]  Yuhui Shi,et al.  Metaheuristic research: a comprehensive survey , 2018, Artificial Intelligence Review.

[47]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[48]  Gloria Bordogna,et al.  Fuzzy extensions of the DBScan clustering algorithm , 2016, Soft Comput..

[49]  Nick Cramer,et al.  Automatic Keyword Extraction from Individual Documents , 2010 .

[50]  Saeed Gholizadeh,et al.  Performance based discrete topology optimization of steel braced frames by a new metaheuristic , 2018, Adv. Eng. Softw..

[51]  Hugo Valadares Siqueira,et al.  Swarm intelligence for clustering - A systematic review with new perspectives on data mining , 2019, Eng. Appl. Artif. Intell..

[52]  Mohammed Azmi Al-Betar,et al.  Optimization methods for power scheduling problems in smart home: Survey , 2019, Renewable and Sustainable Energy Reviews.

[53]  Ziqi Zhang,et al.  Adapted TextRank for Term Extraction: A Generic Method of Improving Automatic Term Extraction Algorithms , 2018, SEMANTiCS.

[54]  Ganapati Panda,et al.  A survey on nature inspired metaheuristic algorithms for partitional clustering , 2014, Swarm Evol. Comput..

[55]  Hossam Faris,et al.  A multi-verse optimizer approach for feature selection and optimizing SVM parameters based on a robust system architecture , 2017, Neural Computing and Applications.

[56]  Rajesh Kumar,et al.  A review on particle swarm optimization algorithms and their applications to data clustering , 2011, Artificial Intelligence Review.

[57]  Chuanpei Xu,et al.  A Multi-Verse Optimizer with Levy Flights for Numerical Optimization and Its Application in Test Scheduling for Network-on-Chip , 2016, PloS one.

[58]  Mohamed Nadif,et al.  Beyond cluster labeling: Semantic interpretation of clusters' contents using a graph representation , 2014, Knowl. Based Syst..

[59]  Vipin Kumar,et al.  Document Categorization and Query Generation on the World Wide Web Using WebACE , 1999, Artificial Intelligence Review.

[60]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[61]  Duoqian Miao,et al.  Influence of kernel clustering on an RBFN , 2019, CAAI Trans. Intell. Technol..

[62]  Lutz Bornmann,et al.  Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references , 2014, J. Assoc. Inf. Sci. Technol..

[63]  Xin-She Yang,et al.  A new hybrid method based on krill herd and cuckoo search for global optimisation tasks , 2016, Int. J. Bio Inspired Comput..

[64]  Hossam Faris,et al.  Evolutionary static and dynamic clustering algorithms based on multi-verse optimizer , 2018, Eng. Appl. Artif. Intell..

[65]  Liang Gao,et al.  Queuing search algorithm: A novel metaheuristic algorithm for solving engineering optimization problems , 2018, Applied Mathematical Modelling.

[66]  Amir Hossein Alavi,et al.  Krill herd: A new bio-inspired optimization algorithm , 2012 .

[67]  Leandro dos Santos Coelho,et al.  Coyote Optimization Algorithm: A New Metaheuristic for Global Optimization Problems , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[68]  Andrea Scharnhorst,et al.  Contextualization of topics: browsing through the universe of bibliographic information , 2017, Scientometrics.

[69]  Chien-Hsing Chen,et al.  Improved TFIDF in big news retrieval: An empirical study , 2017, Pattern Recognit. Lett..

[70]  Farhad Soleimanian Gharehchopogh,et al.  Farmland fertility: A new metaheuristic algorithm for solving continuous optimization problems , 2018, Appl. Soft Comput..

[71]  Mostafa Meshkat,et al.  Stud Multi-Verse Algorithm , 2017, 2017 2nd Conference on Swarm Intelligence and Evolutionary Computation (CSIEC).

[72]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[73]  Hossam Faris,et al.  Training feedforward neural networks using multi-verse optimizer for binary classification problems , 2016, Applied Intelligence.

[74]  Sanda Martinčić-Ipšić,et al.  An Overview of Graph-Based Keyword Extraction Methods and Approaches , 2015 .

[75]  Anupam Yadav,et al.  A dynamic metaheuristic optimization model inspired by biological nervous systems: Neural network algorithm , 2018, Appl. Soft Comput..

[76]  Mohammed Azmi Al-Betar,et al.  Multi-objective power scheduling problem in smart homes using grey wolf optimiser , 2018, J. Ambient Intell. Humaniz. Comput..

[77]  Mohammed Azmi Al-Betar,et al.  A novel hybrid multi-verse optimizer with K-means for text documents clustering , 2020, Neural Computing and Applications.

[78]  S. Shadravan,et al.  The Sailfish Optimizer: A novel nature-inspired metaheuristic algorithm for solving constrained engineering optimization problems , 2019, Eng. Appl. Artif. Intell..

[79]  Emrah Hancer,et al.  A new multi-objective differential evolution approach for simultaneous clustering and feature selection , 2020, Eng. Appl. Artif. Intell..

[80]  Paweł Wojnarowski,et al.  Performance of nature inspired optimization algorithms for polymer Enhanced Oil Recovery process , 2017 .

[81]  Shalini Batra,et al.  MVO-Based 2-D Path Planning Scheme for Providing Quality of Service in UAV Environment , 2018, IEEE Internet of Things Journal.

[82]  Ahamad Tajudin Khader,et al.  EEG Signals Denoising Using Optimal Wavelet Transform Hybridized With Efficient Metaheuristic Methods , 2020, IEEE Access.

[83]  Chengzhi Zhang,et al.  Automatic Keyword Extraction from Documents Using Conditional Random Fields , 2008 .

[84]  Maria P. Grineva,et al.  Extracting key terms from noisy and multitheme documents , 2009, WWW '09.

[85]  Kun Xie,et al.  A new evolutionary neural networks based on intrusion detection systems using multiverse optimization , 2017, Applied Intelligence.

[86]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[87]  Rob Koopman,et al.  Clustering articles based on semantic similarity , 2017, Scientometrics.

[88]  Shengwu Xiong,et al.  Ludo game-based metaheuristics for global and engineering optimization , 2019, Appl. Soft Comput..

[89]  Antonio Bolufé Röhler,et al.  Machine learning based metaheuristic hybrids for S-box optimization , 2020, J. Ambient Intell. Humaniz. Comput..