Big data: From beginning to future

We use structuralism and functionalism paradigms to analyze the origins of big data applications.Current trends and sources of big data.Processing technologies, methods and analysis techniques for big data are compared in detail.We analyze major challenges with big data and also discussed several opportunities.Case studies and emerging technologies for big data problems are discussed. Big data is a potential research area receiving considerable attention from academia and IT communities. In the digital world, the amounts of data generated and stored have expanded within a short period of time. Consequently, this fast growing rate of data has created many challenges. In this paper, we use structuralism and functionalism paradigms to analyze the origins of big data applications and its current trends. This paper presents a comprehensive discussion on state-of-the-art big data technologies based on batch and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also discusses big data analytics techniques, processing methods, some reported case studies from different vendors, several open research challenges, and the opportunities brought about by big data. The similarities and differences of these techniques and technologies based on important parameters are also investigated. Emerging technologies are recommended as a solution for big data problems.

[1]  Sergey V. Kovalchuk,et al.  A Technology for BigData Analysis Task Description Using Domain-specific Languages , 2014, ICCS.

[2]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[3]  Neal Leavitt Bringing big analytics to the masses , 2013, Computer.

[4]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[5]  Xiangyang Xue,et al.  An Improved Generalized Discriminant Analysis for Large-Scale Data Set , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[6]  Tobias J. Osborne,et al.  Quantum phases of a chain of strongly interacting anyons , 2014 .

[7]  Michael H. Böhlen,et al.  Visual Data Mining - Theory, Techniques and Tools for Visual Analytics , 2008, Visual Data Mining.

[8]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[9]  Valerio Pascucci,et al.  Analysis of large-scale scalar data using hixels , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[10]  Valentin Goranko,et al.  Tableau Tool for Testing Satisfiability in LTL: Implementation and Experimental Analysis , 2010, M4M.

[11]  Dong Yu,et al.  Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP] , 2011, IEEE Signal Processing Magazine.

[12]  張正儀,et al.  基於Google Cloud Platform設計高效能日誌分析平台之研究 , 2017 .

[13]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[14]  Haiping Lu,et al.  A survey of multilinear subspace learning for tensor data , 2011, Pattern Recognit..

[15]  Yang Gao,et al.  pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data Analysis , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference.

[16]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[17]  D. Morgan,et al.  Sociological Paradigms and Organizational Analysis. , 1983 .

[18]  Andriy Luntovskyy,et al.  Case Studies on Big Data , 2016 .

[19]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[20]  Shaiful Alam Chowdhury,et al.  Performance Evaluation of Yahoo! S4: A First Look , 2012, 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[21]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[22]  Oscar Castillo,et al.  Optimization of type-2 fuzzy systems based on bio-inspired methods: A concise review , 2012, Inf. Sci..

[23]  Jordi Sabater-Mir,et al.  Reputation and social network analysis in multi-agent systems , 2002, AAMAS '02.

[24]  Muhammad Shiraz,et al.  Big Data: Survey, Technologies, Opportunities, and Challenges , 2014, TheScientificWorldJournal.

[25]  Abdullah Gani,et al.  A survey on indexing techniques for big data: taxonomy and performance evaluation , 2016, Knowledge and Information Systems.

[26]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[27]  Jun Li,et al.  The online auction market in China: a comparative study between Taobao and eBay , 2005, ICEC '05.

[28]  Teruhisa Hochin,et al.  Evaluation of Parallel Indexing Scheme for Big Data , 2015, 2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence.

[29]  Naphtali Rishe,et al.  SksOpen: Efficient Indexing, Querying, and Visualization of Geo-spatial Big Data , 2013, 2013 12th International Conference on Machine Learning and Applications.

[30]  Rajkumar Buyya,et al.  Heterogeneity in Mobile Cloud Computing: Taxonomy and Open Challenges , 2014, IEEE Communications Surveys & Tutorials.

[31]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[32]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[33]  Meng Wang,et al.  Parallel Lasso for Large-Scale Video Concept Detection , 2012, IEEE Transactions on Multimedia.

[34]  M. Anusha,et al.  Big Data-Survey , 2016 .

[35]  Bongsik Shin,et al.  Data quality management, data usage experience and acquisition intention of big data analytics , 2014, Int. J. Inf. Manag..

[36]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[37]  Shengyuan Xu,et al.  Neural-Network-Based Decentralized Adaptive Output-Feedback Control for Large-Scale Stochastic Nonlinear Systems , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[38]  Cheryl Ann Alexander,et al.  Big Data and Visualization: Methods, Challenges and Technology Progress , 2015 .

[39]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[40]  Ainuddin Wahid Abdul Wahab,et al.  A Comprehensive Review on Adaptability of Network Forensics Frameworks for Mobile Cloud Computing , 2014, TheScientificWorldJournal.

[41]  H. Hamann,et al.  Ultra-high-density phase-change storage and memory , 2006, Nature materials.

[42]  Kwan-Liu Ma,et al.  Massively Parallel Software Rendering for Visualizing Large-Scale Data Sets , 2001, IEEE Computer Graphics and Applications.

[43]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[44]  L. Arockiam,et al.  IndexedFCP - An Index Based Approach to Identify Frequent Contiguous Patterns (FCP) in Big Data , 2014, 2014 International Conference on Intelligent Computing Applications.

[45]  Edmon Begoli,et al.  Design Principles for Effective Knowledge Discovery from Big Data , 2012, 2012 Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture.

[46]  Zhuo Chen,et al.  Edge Analytics in the Internet of Things , 2015, IEEE Pervasive Computing.

[47]  Daniel E. O'Leary,et al.  Big Data and Privacy: Emerging Issues , 2015, IEEE Intelligent Systems.

[48]  Dong Yu,et al.  Deep Learning and Its Applications to Signal and Information Processing , 2011 .

[49]  Mahmoud Al-Ayyoub,et al.  The future of mobile cloud computing: Integrating cloudlets and Mobile Edge Computing , 2016, 2016 23rd International Conference on Telecommunications (ICT).

[50]  Philippe Cudré-Mauroux,et al.  CINTIA: A distributed, low-latency index for big interval data , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[51]  Witold Pedrycz,et al.  Granular Computing: Analysis and Design of Intelligent Systems , 2013 .

[52]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[53]  C. L. Philip Chen,et al.  A Multiple-Kernel Fuzzy C-Means Algorithm for Image Segmentation , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[54]  Jeffrey Heer,et al.  Graphical Histories for Visualization: Supporting Analysis, Communication, and Evaluation , 2008, IEEE Transactions on Visualization and Computer Graphics.

[55]  Hairong Sun,et al.  Quality of service: delivering QoS on the internet and in corporate networks; P. Ferguson, G. Huston , 1999, Comput. Commun..

[56]  Muhammad Sahimi,et al.  Efficient Computational Strategies for Solving Global Optimization Problems , 2010, Computing in Science & Engineering.

[57]  K. Prasanna Lakshmi,et al.  A survey on different trends in data streams , 2010, 2010 International Conference on Networking and Information Technology.

[58]  Athman Bouguettaya,et al.  Efficient Service Skyline Computation for Composite Service Selection , 2013, IEEE Transactions on Knowledge and Data Engineering.

[59]  Soumendra Mohanty,et al.  Big Data Imperatives: Enterprise Big Data Warehouse, BI Implementations and Analytics , 2013 .

[60]  Arkady B. Zaslavsky,et al.  Sensing as a Service and Big Data , 2013, ArXiv.

[61]  Yong Zhao,et al.  Concurrent Subspace Width Optimization Method for RBF Neural Network Modeling , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[62]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[63]  Sachchidanand Singh,et al.  Big Data analytics , 2012 .

[64]  Abdullah Gani,et al.  Cloud Adoption in Malaysia: Trends, Opportunities, and Challenges , 2015, IEEE Cloud Computing.

[65]  Daniel A. Keim,et al.  Visual Analytics: Scope and Challenges , 2008, Visual Data Mining.

[66]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[67]  Giner Alor-Hernández,et al.  A general perspective of Big Data: applications, tools, challenges and trends , 2015, The Journal of Supercomputing.

[68]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[69]  Tina Eliassi-Rad,et al.  Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction , 2006 .

[70]  Florent Masseglia,et al.  An efficient algorithm for Web usage mining , 1999 .

[71]  Tim Berners-Lee,et al.  Publishing on the semantic web , 2001, Nature.

[72]  Ejaz Ahmed,et al.  A review on remote data auditing in single cloud server: Taxonomy and open issues , 2014, J. Netw. Comput. Appl..

[73]  Jun Zhuang,et al.  A Real Time Index Model for Big Data Based on DC-Tree , 2013, 2013 International Conference on Advanced Cloud and Big Data.

[74]  Adam Cooper,et al.  Produced by Cetis for Jisc Analytics Series Definition and Essential Characteristics What Is Analytics? Definition and Essential Characteristics , 2022 .

[75]  J. Gerring A case study , 2011, Technology and Society.

[76]  Janusz Kacprzyk,et al.  Computing with words in intelligent database querying: standalone and Internet-based applications , 2001, Inf. Sci..

[77]  Ieee Transactions,et al.  Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction , 2006 .

[78]  Sarah J. Tracy Qualitative Quality: Eight “Big-Tent” Criteria for Excellent Qualitative Research , 2010 .

[79]  Ramón Doallo,et al.  CircadiOmics: integrating circadian genomics, transcriptomics, proteomics and metabolomics , 2012, Nature Methods.

[80]  Abhishek Khare Big data: Magnification beyond the relational database and data mining exigency of cloud computing , 2014, 2014 Conference on IT in Business, Industry and Government (CSIBIG).

[81]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[82]  Waqas Anwar,et al.  Contextual advertising using keyword extraction through collocation , 2009, FIT.

[83]  Michael Chu,et al.  Scientific and Engineering Computing Using ATI Stream Technology , 2009, Computing in Science & Engineering.

[84]  Jong-Suk Ruth Lee,et al.  Study on Big Data Center Traffic Management Based on the Separation of Large-Scale Data Stream , 2013, 2013 Seventh International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.

[85]  Cong Wang,et al.  Enabling Public Auditability and Data Dynamics for Storage Security in Cloud Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[86]  C. L. Philip Chen,et al.  A Collaborative Fuzzy Clustering Algorithm in Distributed Network Environments , 2014, IEEE Transactions on Fuzzy Systems.

[87]  Shaocheng Tong,et al.  Adaptive Neural Output Feedback Tracking Control for a Class of Uncertain Discrete-Time Nonlinear Systems , 2011, IEEE Transactions on Neural Networks.

[88]  Krishnamurthy Srinivasan,et al.  E-Business Process Modeling: The Next Big Step , 2002, Computer.

[89]  Craig MacDonald,et al.  Scalable distributed event detection for Twitter , 2013, 2013 IEEE International Conference on Big Data.

[90]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[91]  Jason J. Jung,et al.  Social big data: Recent achievements and new challenges , 2015, Information Fusion.

[92]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[93]  Brian N. Bershad,et al.  Execution characteristics of desktop applications on Windows NT , 1998, ISCA.

[94]  Ejaz Ahmed,et al.  A survey on mobile edge computing , 2016, 2016 10th International Conference on Intelligent Systems and Control (ISCO).

[95]  Feng Xia,et al.  An experimental analysis on cloud-based mobile augmentation in mobile cloud computing , 2014, IEEE Transactions on Consumer Electronics.

[96]  I. Halcu,et al.  A big data implementation based on Grid computing , 2013, 2013 11th RoEduNet International Conference.

[97]  Patrick Martin,et al.  Assisting developers of Big Data Analytics Applications when deploying on Hadoop clouds , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[98]  Xiaofang Zhao,et al.  VegaIndexer: A Distributed composite index scheme for big spatio-temporal sensor data on cloud , 2013, 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS.

[99]  Oliver Pesch Usage Statistics , 2004 .

[100]  Eugene Charniak,et al.  Artificial Intelligence Programming , 1987 .

[101]  JakoB,et al.  6 Top Tools for Taming Big Data , 2012 .

[102]  Xiaodong Li,et al.  Cooperatively Coevolving Particle Swarms for Large Scale Optimization , 2012, IEEE Transactions on Evolutionary Computation.

[103]  Gregor von Bochmann,et al.  Crawling rich internet applications: the state of the art , 2012, CASCON.

[104]  Goutam Chakraborty,et al.  Analysis of Unstructured Data: Applications of Text Analytics and Sentiment Mining , 2014 .

[105]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[106]  Ejaz Ahmed,et al.  Securing software defined networks: taxonomy, requirements, and open issues , 2015, IEEE Communications Magazine.

[107]  Wolfgang Lehner,et al.  SAP HANA database: data management for modern business applications , 2012, SGMD.

[108]  Ricardo Baeza-Yates,et al.  Web Structure Mining , 2010 .

[109]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[110]  Ronald Rousseau,et al.  Social network analysis: a powerful strategy, also for the information sciences , 2002, J. Inf. Sci..

[111]  Rajkumar Buyya,et al.  Cloud-Based Augmentation for Mobile Devices: Motivation, Taxonomies, and Open Challenges , 2013, IEEE Communications Surveys & Tutorials.

[112]  Katsunari Shibata,et al.  Effect of number of hidden neurons on learning in large-scale layered neural networks , 2009, 2009 ICCAS-SICE.

[113]  Gerald Gilbert,et al.  Introduction to Special Issue on quantum cryptography , 2014, Quantum Inf. Process..

[114]  Thomas J. Naughton,et al.  Photonic neural networks , 2012, Nature Physics.

[115]  Victor I. Chang,et al.  Multimedia augmented m-learning: Issues, trends and open challenges , 2016, Int. J. Inf. Manag..

[116]  Wei Chen,et al.  Influence diffusion dynamics and influence maximization in social networks with friend and foe relationships , 2011, WSDM.

[117]  Muhammad Khurram Khan,et al.  Cloud resource allocation schemes: review, taxonomy, and opportunities , 2017, Knowledge and Information Systems.

[118]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[119]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[120]  Dengfeng Sun,et al.  A Parallel Computing Framework for Large-Scale Air Traffic Flow Optimization , 2012, IEEE Transactions on Intelligent Transportation Systems.

[121]  Feng Xia,et al.  Rich Mobile Applications: Genesis, taxonomy, and open issues , 2014, J. Netw. Comput. Appl..

[122]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[123]  Nor Badrul Anuar,et al.  The role of big data in smart city , 2016, Int. J. Inf. Manag..

[124]  Elisa Bertino,et al.  Indexing Techniques for Advanced Database Systems , 1997, The Springer International Series on Advances in Database Systems.

[125]  Wooyoung Kim,et al.  Parallel Clustering Algorithms : Survey , 2009 .

[126]  Xin Yao,et al.  Large scale evolutionary optimization using cooperative coevolution , 2008, Inf. Sci..

[127]  Richa Gupta,et al.  Journey from Data Mining to Web Mining to Big Data , 2014, ArXiv.

[128]  Xi Fang,et al.  3. Full Four-channel 6.3-gb/s 60-ghz Cmos Transceiver with Low-power Analog and Digital Baseband Circuitry 7. Smart Grid — the New and Improved Power Grid: a Survey , 2022 .

[129]  David Carasso,et al.  Exploring Splunk , 2012 .

[130]  Ibrar Yaqoob,et al.  A survey of big data management: Taxonomy and state-of-the-art , 2016, J. Netw. Comput. Appl..

[131]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.