Introduction to Big data Technology

Big data is no more “all just hype” but widely applied in nearly all aspects of our business, governments, and organizations with the technology stack of AI. Its influences are far beyond a simple technique innovation but involves all rears in the world. This chapter will first have historical review of big data; followed by discussion of characteristics of big data, i.e. from the 3V’s to up 10V’s of big data. The chapter then introduces technology stacks for an organization to build a big data application, from infrastructure/platform/ecosystem to constructional units and components. Finally, we provide some big data online resources for reference.

[1]  Krzysztof Janowicz,et al.  Linked Data, Big Data, and the 4th Paradigm , 2013, Semantic Web.

[2]  Rita L. Sallam,et al.  Magic Quadrant for Business Intelligence and Analytics Platforms , 2013 .

[3]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[4]  FanWei,et al.  Mining big data , 2013 .

[5]  Bilal Abu-Salih,et al.  Domain-specific Knowledge Graphs: A survey , 2020, J. Netw. Comput. Appl..

[6]  Zhixin Liu,et al.  Affective design using machine learning: a survey and its prospect of conjoining big data , 2018, Int. J. Comput. Integr. Manuf..

[7]  Pornpit Wongthongtham,et al.  Tree-based Classification to Users' Trustworthiness in OSNs , 2018, ICCAE.

[8]  Habib Shah,et al.  The 10 Vs, Issues and Challenges of Big Data , 2018, ICBDE.

[10]  Lars George,et al.  HBase: The Definitive Guide , 2011 .

[11]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[12]  Kit Yan Chan,et al.  Twitter mining for ontology-based domain discovery incorporating machine learning , 2018, J. Knowl. Manag..

[13]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[14]  Frank Levy,et al.  Data-Driven Innovation for Growth and Well-being , 2015 .

[15]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[16]  Cees T. A. M. de Laat,et al.  Addressing big data issues in Scientific Data Infrastructure , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[17]  Dazhi Chong,et al.  Big data analytics: a literature review , 2015 .

[18]  Christophe Nicolle,et al.  Understandable Big Data: A survey , 2015, Comput. Sci. Rev..

[19]  Pierluigi Siano,et al.  Big Data Issues in Smart Grids: A Survey , 2019, IEEE Systems Journal.

[20]  Kit Yan Chan,et al.  State-of-the-Art Ontology Annotation for Personalised Teaching and Learning and Prospects for Smart Learning Recommender Based on Multiple Intelligence and Fuzzy Ontology , 2018, International Journal of Fuzzy Systems.

[21]  Bilal Abu-Salih,et al.  Toward a Knowledge-based Personalised Recommender System for Mobile App Development , 2019, J. Univers. Comput. Sci..

[22]  Abhishek Sharma,et al.  Augmenting Data Warehouses with Big Data , 2015, Inf. Syst. Manag..

[23]  Kit Yan Chan,et al.  Social Credibility Incorporating Semantic Analysis and Machine Learning: A Survey of the State-of-the-Art and Future Research Directions , 2019, AINA Workshops.

[24]  B. Elger,et al.  What is your definition of Big Data? Researchers’ understanding of the phenomenon of the decade , 2020, PloS one.

[25]  Ali Emrouznejad,et al.  Big Data Optimization: Recent Developments and Challenges , 2016 .

[26]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[27]  Navarun Gupta,et al.  Seven V's of Big Data understanding Big Data to extract value , 2014, Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education.

[28]  Andrea De Mauro,et al.  What is big data? A consensual definition and a review of key research topics , 2015, AIP Conference Proceedings.

[29]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[30]  Kok Wai Wong,et al.  Unlocking Social Media and User Generated Content as a Data Source for Knowledge Management , 2019, Int. J. Knowl. Manag..

[31]  Ahmed Elragal,et al.  Big Data Analytics: A Literature Review Paper , 2014, ICDM.

[32]  David Stuart,et al.  The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences , 2015, Online Inf. Rev..

[33]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[34]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[35]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[36]  Fenghua Li,et al.  Brief Talk About Big Data Graph Analysis and Visualization , 2019, Journal on Big Data.

[37]  G. Asokan,et al.  Leveraging “big data” to enhance the effectiveness of “one health” in an era of health informatics , 2015, Journal of epidemiology and global health.

[38]  Hossam Faris,et al.  Relational Learning Analysis of Social Politics using Knowledge Graph Embedding , 2020, Data Mining and Knowledge Discovery.

[39]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[40]  Kit Yan Chan,et al.  CredSaT: Credibility ranking of users in big social data incorporating semantic analysis and temporal factor , 2018, J. Inf. Sci..

[41]  Kevin Haynes,et al.  Using ‘big data’ to validate claims made in the pharmaceutical approval process , 2015, Journal of medical economics.

[42]  Jayanthi Ranjan,et al.  The 10 Vs of Big Data framework in the Context of 5 Industry Verticals , 2019, PRODUCTIVITY.

[43]  Adam Barker,et al.  Undefined By Data: A Survey of Big Data Definitions , 2013, ArXiv.

[44]  Erik Hofmann,et al.  Big data and supply chain decisions: the impact of volume, variety and velocity properties on the bullwhip effect , 2017, Int. J. Prod. Res..

[45]  Pornpit Wongthongtham,et al.  Ontology and trust based data warehouse in new generation of business intelligence: State-of-the-art, challenges, and opportunities , 2015, 2015 IEEE 13th International Conference on Industrial Informatics (INDIN).

[46]  Scott Shenker,et al.  Shark: fast data analysis using coarse-grained distributed memory , 2012, SIGMOD Conference.

[47]  Krish Krishnan,et al.  Introduction to Big Data , 2013 .

[48]  Matko MaruÅ¡iÄ,et al.  The Croatian Medical Journal , 2014 .

[49]  F. E.,et al.  A Relational Model of Data Large Shared Data Banks , 2000 .

[50]  Surajit Chaudhuri,et al.  An overview of business intelligence technology , 2011, Commun. ACM.

[51]  Victoria L. Rubin,et al.  Veracity Roadmap: Is Big Data Objective, Truthful and Credible? , 2014 .

[52]  Kit Yan Chan,et al.  Time-aware domain-based social influence prediction , 2020, Journal of Big Data.

[53]  James M. Tien,et al.  Big Data: Unleashing information , 2013, 2013 10th International Conference on Service Systems and Service Management.

[54]  Wo L. Chang,et al.  NIST Big Data Interoperability Framework: Volume 1, Big Data Definitions [Version 2] , 2015 .

[55]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[56]  Pornpit Wongthongtham,et al.  Ontology-based approach for identifying the credibility domain in social Big Data , 2018, J. Organ. Comput. Electron. Commer..