An Artificial Intelligence Driven Multi-Feature Extraction Scheme for Big Data Detection

The Internet improves the speed of information dissemination, and the scale of unstructured text data is expanding and increasingly being used for mass communication. Although these large amounts of data meet the infinite demand, it is difficult to find public focus in a timely manner. Therefore, information extraction from big data has become an important research issue, and there are many published studies on big data processing at home and abroad. In this paper, we propose a multi-feature keyword extraction method, and based on this, an artificial intelligence driven big data MFE scheme is designed, then an application example of the general scheme is expanded and detailed. Taking news as the carrier, this scheme is applied to the algorithm design of hot event detection. As a result, a multi-feature fusion clustering algorithm is proposed based on user attention with two main stages. In the first stage, a multi-feature fusion model is developed to evaluate keywords, and this model combines the term frequency and part of speech features. We use it to extract keywords for representing news and events. In the second stage, we perform clustering and detect hot events in accordance with the procedure, and during the composition of news clusters, we analyze several variadic parameters in order to explore the optimal effectiveness. Then, experiments on the news corpus are conducted, and the results show that the approach presented herein performs well.

[1]  Keqiu Li,et al.  Optimized big data K-means clustering using MapReduce , 2014, The Journal of Supercomputing.

[2]  Naixue Xiong,et al.  An Effective Dictionary Learning Algorithm Based on fMRI Data for Mobile Medical Disease Analysis , 2019, IEEE Access.

[3]  Heng Ji,et al.  Cross-document Event Extraction and Tracking: Task, Evaluation, Techniques and Challenges , 2009, RANLP.

[4]  Véronique Hoste,et al.  A Classification-based Approach to Economic Event Detection in Dutch News Text , 2016, LREC.

[5]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[6]  Luis Gravano,et al.  Querying text databases for efficient information extraction , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Bin Wang,et al.  A probabilistic model for retrospective news event detection , 2005, SIGIR '05.

[8]  Sunghwan Sohn,et al.  Application of a Natural Language Processing Algorithm to Asthma Ascertainment. An Automated Chart Review , 2017, American journal of respiratory and critical care medicine.

[9]  Joe Carthy,et al.  Combining semantic and syntactic document classifiers to improve first story detection , 2001, SIGIR '01.

[10]  Vasudha Bhatnagar,et al.  sCAKE: Semantic Connectivity Aware Keyword Extraction , 2018, Inf. Sci..

[11]  Jaime G. Carbonell,et al.  Automatic Keyword Extraction on Twitter , 2015, ACL.

[12]  José Antonio Lozano,et al.  An efficient approximation to the K-means clustering for massive data , 2017, Knowl. Based Syst..

[13]  Naixue Xiong,et al.  A member recognition approach for specific organizations based on relationships among users in social networking Twitter , 2019, Future Gener. Comput. Syst..

[14]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[15]  Yiannis Kompatsiaris,et al.  easIE , 2018, ACM Trans. Internet Techn..

[16]  Chengzhi Zhang,et al.  Automatic Keyword Extraction from Documents Using Conditional Random Fields , 2008 .

[17]  James Allan,et al.  Text classification and named entities for new event detection , 2004, SIGIR '04.

[18]  Meliha Yetisgen-Yildiz,et al.  Tumor information extraction in radiology reports for hepatocellular carcinoma patients , 2016, CRI.

[19]  Satoshi Sekine,et al.  Automatic paraphrase acquisition from news articles , 2002 .

[20]  Chien Chin Chen,et al.  Life Cycle Modeling of News Events Using Aging Theory , 2003, ECML.

[21]  Poorva Agrawal,et al.  A survey on text document categorization using enhanced sentence vector space model and bi-gram text representation model based on novel fusion techniques , 2018, 2018 2nd International Conference on Inventive Systems and Control (ICISC).

[22]  Sunghwan Sohn,et al.  Mining peripheral arterial disease cases from narrative clinical notes using natural language processing , 2017, Journal of vascular surgery.

[23]  Aloysius George,et al.  Efficient high dimension data clustering using constraint-partitioning k-means algorithm , 2013, Int. Arab J. Inf. Technol..

[24]  Jan Weglarz,et al.  Scheduling aspects in keyword extraction problem , 2018, Int. Trans. Oper. Res..

[25]  Dolf Trieschnigg,et al.  Hierarchical topic detection in large digital news archives: Exploring a sample based approach , 2005, J. Digit. Inf. Manag..

[26]  Qingtian Zeng,et al.  Micro-blog Keyword Extraction Method Based on Graph Model and Semantic Space , 2013, J. Multim..

[27]  Marcel Karnstedt,et al.  Graph-Based Methods for Clustering Topics of Interest in Twitter , 2015, ICWE.

[28]  Eduardo Mena,et al.  The GENIE Project - A Semantic Pipeline for Automatic Document Categorisation , 2014, WEBIST.

[29]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[30]  Kuo Zhang,et al.  New event detection based on indexing-tree and named entity , 2007, SIGIR.

[31]  Eduardo Mena,et al.  NASS: News Annotation Semantic System , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[32]  Frank Puppe,et al.  UIMA Ruta: Rapid development of rule-based information extraction applications , 2014, Natural Language Engineering.

[33]  Eduardo Mena,et al.  TM-Gen: A Topic Map Generator from Text Documents , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[34]  Douglas E. Appelt,et al.  FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text , 1997, ArXiv.

[35]  Seungmin Rho,et al.  Detecting trend and bursty keywords using characteristics of Twitter stream data , 2013 .

[36]  Hidetsugu Nanba,et al.  Automatic Extraction of Event Information from Newspaper Articles and Web Pages , 2013, ICADL.

[37]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[38]  Andrew McCallum,et al.  Information Extraction , 2005, ACM Queue.

[39]  Greg Hamerly,et al.  Alternatives to the k-means algorithm that find better clusterings , 2002, CIKM '02.

[40]  Eduardo Mena,et al.  GEO-NASS: A Semantic Tagging Experience from Geographical Data on the Media , 2013, ADBIS.

[41]  Dietrich Klakow,et al.  A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters , 2010, LREC.

[42]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[43]  Naixue Xiong,et al.  An effective information detection method for social big data , 2018, Multimedia Tools and Applications.

[44]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Naixue Xiong,et al.  Joint Mobile Data Collection and Wireless Energy Transfer in Wireless Rechargeable Sensor Networks , 2017, Sensors.

[46]  Jean-Marc Vesin,et al.  A Novel Short-Term Event Extraction Algorithm for Biomedical Signals , 2018, IEEE Transactions on Biomedical Engineering.

[47]  Els Lefever,et al.  Economic Event Detection in Company-Specific News Text , 2018, ECONLP@ACL.

[48]  Wenyin Liu,et al.  Shared Multi-View Data Representation for Multi-Domain Event Detection , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[50]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[51]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[52]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[53]  Min Zhang,et al.  An Automatic Online News Topic Keyphrase Extraction System , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[54]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[55]  David E. Millard,et al.  Automatic Ontology-Based Knowledge Extraction from Web Documents , 2003, IEEE Intell. Syst..