Analysis and processing aspects of data in big data applications

Abstract Data analysis and processing is playing an important role because of the large amount of data generated through various sources of big data. It is an important component in big data-based applications. Data qualities are the main concern in the data acquisition, transformation and data pre-processing under big data applications. Data pre-processing is required because of inconsistent, noisy and incomplete data generation in big data applications. Data analysis basically encompasses different methods and a function applicable to data’s to detect characteristics such as data type, size, format, patterns and so on. Based on data format, it’s easy to identify data qualities for further use in various applications. Moreover, data analysis and processing includes various steps such as data qualities identification, statistical analysis of data, defining modeling, and hypothetical testing of model and result from analysis. Raw data is unused data and required analysis, filtering, and processing in any system. This paper deals with the analysis and processing aspects of raw data and cleaned data in big data applications. This paper also deals with data cleaning and its implementation concepts.

[1]  Evgeniy Yur'evich Gorodov,et al.  Analytical Review of Data Visualization Methods in Application to Big Data , 2013, J. Electr. Comput. Eng..

[2]  Andrian Marcus,et al.  Data Cleansing: A Prelude to Knowledge Discovery , 2005, Data Mining and Knowledge Discovery Handbook.

[3]  Yung-Tsung Hou,et al.  Big data analysis for distributed computing job scheduling and reliability evaluation , 2019, Microelectronics Reliability.

[4]  Dhananjay Kumar,et al.  Improving Mapreduce for Incremental Processing Using Map Data Storage , 2016 .

[5]  Ayoub Ait Lahcen,et al.  Big Data technologies: A survey , 2017, J. King Saud Univ. Comput. Inf. Sci..

[6]  Raghav Yadav,et al.  Mining gene expression data using data mining techniques: A critical review , 2020, Journal of Information and Optimization Sciences.

[7]  Hsien-Cheng Lin,et al.  Cultural Effects on Use of Online Social Media for Health-Related Information Acquisition and Sharing in Taiwan , 2018, Int. J. Hum. Comput. Interact..

[8]  Tok Wang Ling,et al.  A knowledge-based approach for duplicate elimination in data cleaning , 2001, Inf. Syst..

[9]  Jing Gong,et al.  Association feature mining algorithm of web accessing data in big data environment , 2018 .

[10]  Taghi M. Khoshgoftaar,et al.  A survey on addressing high-class imbalance in big data , 2018, Journal of Big Data.

[11]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[12]  Muhammad Shiraz,et al.  Big Data: Survey, Technologies, Opportunities, and Challenges , 2014, TheScientificWorldJournal.

[13]  Lutful Karim,et al.  An Efficient Distributed Algorithm for Big Data Processing , 2017 .

[14]  Muhammad Younas,et al.  Emerging trends and technologies in big data processing , 2015, Concurr. Comput. Pract. Exp..

[15]  Puneet Goswami,et al.  k-DDD Measure and MapReduce Based Anonymity Model for Secured Privacy-Preserving Big Data Publishing , 2019, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[16]  Vladimir Vlassov,et al.  MapReduce: Limitations, Optimizations and Open Issues , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[17]  Z. Irani,et al.  Critical analysis of Big Data challenges and analytical methods , 2017 .

[18]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[19]  Rajkumar Buyya,et al.  The anatomy of big data computing , 2015, Softw. Pract. Exp..

[20]  D. Blazquez,et al.  Big Data sources and methods for social and economic analyses , 2017 .

[21]  Christian Reuter,et al.  Social Media in Crisis Management: An Evaluation and Analysis of Crisis Informatics Research , 2018, Int. J. Hum. Comput. Interact..

[22]  Hui Xiong,et al.  Enhancing data analysis with noise removal , 2006, IEEE Transactions on Knowledge and Data Engineering.

[23]  Ridha Khedri,et al.  An Algebraic Approach Towards Data Cleaning , 2013, EUSPN/ICTH.

[24]  Gunter Saake,et al.  Analyzing data quality issues in research information systems via data profiling , 2018, Int. J. Inf. Manag..

[25]  Linesh Raja,et al.  On-Demand Routing Protocols for Vehicular Cloud Computing , 2021, Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing.

[26]  Huaichao Yan Mass data storage and sharing algorithm in distributed heterogeneous environment , 2018 .

[27]  Chun-Hua Zheng,et al.  Analysis of technology diffusion in agricultural industry cluster based on system dynamics and simulation model , 2018 .

[28]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[29]  Vislab WANG Yilei Data Cleansing , 2017, Encyclopedia of Machine Learning and Data Mining.

[30]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.