An Efficient and Unique TF/IDF Algorithmic Model-Based Data Analysis for Handling Applications with Big Data Streaming

As the field of data science grows, document analytics has become a more challenging task for rough classification, response analysis, and text summarization. These tasks are used for the analysis of text data from various intelligent sensing systems. The conventional approach for data analytics and text processing is not useful for big data coming from intelligent systems. This work proposes a novel TF/IDF algorithm with the temporal Louvain approach to solve the above problem. Such an approach is supposed to help the categorization of documents into hierarchical structures showing the relationship between variables, which is a boon to analysts making essential decisions. This paper used public corpora, such as Reuters-21578 and 20 Newsgroups for massive-data analytic experimentation. The result shows the efficacy of the proposed algorithm in terms of accuracy and execution time across six datasets. The proposed approach is validated to bring value to big text data analysis. Big data handling with map-reduce has led to tremendous growth and support for tasks like categorization, sentiment analysis, and higher-quality accuracy from the input data. Outperforming the state-of-the-art approach in terms of accuracy and execution time for six datasets ensures proper validation.

[1]  Philip S. Yu,et al.  Temporal Dynamic Matrix Factorization for Missing Data Prediction in Large Scale Coevolving Time Series , 2016, IEEE Access.

[2]  Flora D. Salim,et al.  Clustering Big Spatiotemporal-Interval Data , 2016, IEEE Transactions on Big Data.

[3]  Justine Rochas,et al.  K Nearest Neighbour Joins for Big Data on MapReduce: A Theoretical and Experimental Analysis , 2016, IEEE Transactions on Knowledge and Data Engineering.

[4]  Iman Saleh,et al.  Social-Network-Sourced Big Data Analytics , 2013, IEEE Internet Computing.

[5]  Hao Wang,et al.  BeTL: MapReduce Checkpoint Tactics Beneath the Task Level , 2016, IEEE Transactions on Services Computing.

[6]  Minyi Guo,et al.  On Traffic-Aware Partition and Aggregation in MapReduce for Big Data Applications , 2016, IEEE Transactions on Parallel and Distributed Systems.

[7]  Satoshi Matsuoka,et al.  GPU-Accelerated Large-Scale Distributed Sorting Coping with Device Memory Capacity , 2016, IEEE Transactions on Big Data.

[8]  Wang Yong,et al.  Socio-Technological Factors Affecting User’s Adoption of eHealth Functionalities: A Case Study of China and Ukraine eHealth Systems , 2019, IEEE Access.

[9]  Dongqing Xie,et al.  Social Influence Analysis in Social Networking Big Data: Opportunities and Challenges , 2017, IEEE Network.

[10]  Jarek Gryz,et al.  Interactive Visualization of Large Data Sets , 2016, IEEE Transactions on Knowledge and Data Engineering.

[11]  Marimuthu Palaniswami,et al.  A Hybrid Approach to Clustering in Big Data , 2016, IEEE Transactions on Cybernetics.

[12]  Kun Gao,et al.  Deep Data Stream Analysis Model and Algorithm With Memory Mechanism , 2017, IEEE Access.

[13]  Leonidas Fegaras,et al.  Incremental Query Processing on Big Data Streams , 2015, IEEE Transactions on Knowledge and Data Engineering.

[14]  Hui Guo,et al.  A Survey on Emerging Computing Paradigms for Big Data , 2017 .

[15]  Kathiravan Srinivasan,et al.  A hybrid NSCT domain image watermarking scheme , 2017, EURASIP J. Image Video Process..

[16]  Celestine Iwendi,et al.  ACO based key management routing mechanism for WSN security and data collection , 2018, 2018 IEEE International Conference on Industrial Technology (ICIT).

[17]  Ying Liu,et al.  A Crowdsourcing Worker Quality Evaluation Algorithm on MapReduce for Big Data Applications , 2016, IEEE Transactions on Parallel and Distributed Systems.

[18]  Sherif Sakr Big Data Processing Stacks , 2017, IT Professional.

[19]  Valerie Daggett,et al.  DIVE: A Graph-Based Visual-Analytics Framework for Big Data , 2014, IEEE Computer Graphics and Applications.

[20]  Jun Zhu,et al.  A Framework-Based Approach to Utility Big Data Analytics , 2016, IEEE Transactions on Power Systems.

[21]  Rajiv Ranjan,et al.  Streaming Big Data Processing in Datacenter Clouds , 2014, IEEE Cloud Computing.

[22]  Wing Cheong Lau,et al.  Optimization for Speculative Execution in Big Data Processing Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.

[23]  Zili Zhang,et al.  A Map Reduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction , 2016, IEEE Access.

[24]  Chen Lin,et al.  DAG-SVM based infant cry classification system using sequential forward floating feature selection , 2016, Multidimensional Systems and Signal Processing.

[25]  Weisong Shi,et al.  Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications , 2015, IEEE Transactions on Parallel and Distributed Systems.

[26]  J. Jayakumari,et al.  Distributed document clustering analysis based on a hybrid method , 2017, China Communications.

[27]  Georgios B. Giannakis,et al.  Online Censoring for Large-Scale Regressions with Application to Streaming Big Data , 2015, IEEE Transactions on Signal Processing.

[28]  Houbing Song,et al.  Mobile Cloud Computing Model and Big Data Analysis for Healthcare Applications , 2016, IEEE Access.

[29]  Xiang-Gen Xia Small Data, Mid Data, and Big Data Versus Algebra, Analysis, and Topology [Perspectives] , 2017, IEEE Signal Processing Magazine.

[30]  George Atia,et al.  Randomized Robust Subspace Recovery and Outlier Detection for High Dimensional Data Matrices , 2015, IEEE Transactions on Signal Processing.

[31]  Zhihan Lv,et al.  Empirical Analysis and Modeling of the Activity Dilemmas in Big Social Networks , 2017, IEEE Access.

[32]  Dhabaleswar K. Panda,et al.  A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.

[33]  Abdelwahab Hamou-Lhadj,et al.  Operational-Log Analysis for Big Data Systems: Challenges and Solutions , 2016, IEEE Software.

[34]  Adrian Barbu,et al.  Feature Selection with Annealing for Computer Vision and Big Data Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Nei Kato,et al.  A Mobility Analytical Framework for Big Mobile Data in Densely Populated Area , 2017, IEEE Transactions on Vehicular Technology.