The big data system, components, tools, and technologies: a survey
暂无分享,去创建一个
Pabitra Mitra | T. Ramalingeswara Rao | Ravindara Bhatt | A. Goswami | T. R. Rao | Pabitra Mitra | Ravindara Bhatt | A. Goswami
[1] Jignesh M. Patel,et al. Column-Oriented Storage Techniques for MapReduce , 2011, Proc. VLDB Endow..
[2] Keshav Pingali,et al. Parallel graph analytics , 2016, Commun. ACM.
[3] Rob Kitchin,et al. The data revolution : big data, open data, data infrastructures & their consequences , 2014 .
[4] Milind Bhandarkar,et al. HAWQ: a massively parallel processing SQL engine in hadoop , 2014, SIGMOD Conference.
[5] Gianmarco De Francisci Morales,et al. SAMOA: scalable advanced massive online analysis , 2015, J. Mach. Learn. Res..
[6] Murray Campbell,et al. Analytics Ecosystem Transformation: A Force for Business Model Innovation , 2011, 2011 Annual SRII Global Conference.
[7] Phil Trinder,et al. Scalable persistent storage for Erlang: theory and practice , 2013, Erlang '13.
[8] Joseph Gonzalez,et al. GraphLab: A Distributed Framework for Machine Learning in the Cloud , 2011, ArXiv.
[9] Marcin Zukowski,et al. Vectorwise: Beyond Column Stores , 2012, IEEE Data Eng. Bull..
[10] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.
[11] Joshua Zhexue Huang,et al. Big data analytics on Apache Spark , 2016, International Journal of Data Science and Analytics.
[12] Carlo Curino,et al. Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications , 2015, SIGMOD Conference.
[13] Yunhuai Liu,et al. The big data analytics and applications of the surveillance system using video structured description technology , 2016, Cluster Computing.
[14] Paul Zikopoulos,et al. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .
[15] Vladimir Batagelj,et al. Pajek - Program for Large Network Analysis , 1999 .
[16] Riccardo Miotto,et al. Translational bioinformatics in the era of real-time biomedical, health care and wellness data streams , 2016, Briefings Bioinform..
[17] Emad A. Mohammed,et al. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends , 2014, BioData Mining.
[18] Cory Doctorow,et al. Big data: Welcome to the petacentre , 2008, Nature.
[19] Reynold Xin,et al. Apache Spark , 2016 .
[20] Yonggang Wen,et al. Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.
[21] Shahriar Akter,et al. Big data analytics in E-commerce: a systematic review and agenda for future research , 2016, Electronic Markets.
[22] David A. Schweidel,et al. Opportunities for Innovation in Social Media Analytics , 2017 .
[23] Michael J. Freedman,et al. Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area , 2014, NSDI.
[24] Gordon S. Blair,et al. A generic component model for building systems software , 2008, TOCS.
[25] V. Ganesh,et al. HBase and Hypertable for large scale distributed storage systems A Performance evaluation for Open Source BigTable Implementations , 2008 .
[26] Reynold Xin,et al. GraphX: Unifying Data-Parallel and Graph-Parallel Analytics , 2014, ArXiv.
[27] Reynold Xin,et al. GraphX: a resilient distributed graph system on Spark , 2013, GRADES.
[28] Michael Stonebraker,et al. Aurora: a data stream management system , 2003, SIGMOD '03.
[29] Joseph K. Bradley,et al. Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.
[30] Mathieu Bastian,et al. Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.
[31] Yuzhong Qu,et al. Falcon-AO: A practical ontology matching system , 2008, J. Web Semant..
[32] Erhard Rahm,et al. 2 J un 2 01 5 GRADOOP : Scalable Graph Data Management and Analytics with Hadoop-Technical Report - , 2015 .
[33] Michael Stonebraker,et al. C-Store: A Column-oriented DBMS , 2005, VLDB.
[34] Harald Lampesberger,et al. Technologies for Web and cloud service interaction: a survey , 2014, Service Oriented Computing and Applications.
[35] Eunmi Choi,et al. A Taxonomy and Survey on Distributed File Systems , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.
[36] Rolf Apweiler,et al. The European Bioinformatics Institute in 2017: data coordination and integration , 2017, Nucleic Acids Res..
[37] Eric A. Brewer,et al. A certain freedom: thoughts on the CAP theorem , 2010, PODC.
[38] Werner Vogels,et al. Dynamo: amazon's highly available key-value store , 2007, SOSP.
[39] Lei Gao,et al. Serving large-scale batch computed data with project Voldemort , 2012, FAST.
[40] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[41] Jennifer Widom,et al. GPS: a graph processing system , 2013, SSDBM.
[42] Andreas Neumann,et al. Oozie: towards a scalable workflow management system for Hadoop , 2012, SWEET '12.
[43] Ioana Manolescu,et al. Web Data Management , 2011 .
[44] Jose-Norberto Mazón,et al. A survey on summarizability issues in multidimensional modeling , 2009, Data Knowl. Eng..
[45] Frank B. Schmuck,et al. GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.
[46] Gábor Csárdi,et al. The igraph software package for complex network research , 2006 .
[47] Sriram Rao,et al. A The Quantcast File System , 2013, Proc. VLDB Endow..
[48] Daniel Mills,et al. MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..
[49] Scott Shenker,et al. Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.
[50] V. Marx. Biology: The big challenges of big data , 2013, Nature.
[51] Matteo Muratori,et al. Big Data issues and opportunities for electric utilities , 2015 .
[52] Vijay V. Raghavan,et al. Big Data: Promises and Problems , 2015, Computer.
[53] Leonardo Neumeyer,et al. S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.
[54] Rajkumar Buyya,et al. Distributed data stream processing and edge computing: A survey on resource elasticity and future directions , 2017, J. Netw. Comput. Appl..
[55] Sanjeev Kumar,et al. Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.
[56] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.
[57] Randy H. Katz,et al. Chukwa: A System for Reliable Large-Scale Log Collection , 2010, LISA.
[58] Holly Bell. QlikView 11 For Developers , 1991 .
[59] Martin L. Kersten,et al. MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..
[60] Muhammad Anis Uddin Nasir. Fault Tolerance for Stream Processing Engines , 2016, ArXiv.
[61] Keshav Pingali,et al. A lightweight infrastructure for graph analytics , 2013, SOSP.
[62] Xindong Wu,et al. Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.
[63] Siddharth Swarup Rautaray,et al. Real time financial analysis using big data technologies , 2017, 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC).
[64] Ameet Talwalkar,et al. MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..
[65] Sun-Yuan Kung. Visualization of big data , 2015, ICCI*CC.
[66] Kathleen Ting,et al. Apache Sqoop Cookbook , 2013 .
[67] Shu-Ching Chen,et al. Multimedia Big Data Analytics , 2018, ACM Comput. Surv..
[68] GhemawatSanjay,et al. The Google file system , 2003 .
[69] Rajkumar Buyya,et al. Big Data: Principles and Paradigms , 2016 .
[70] Madhu Siddalingaiah,et al. Pro Apache Hadoop , 2014, Apress.
[71] Terry Jones,et al. Performance of the IBM general parallel file system , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[72] Jignesh M. Patel,et al. Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.
[73] Rodney Landrum,et al. Transparent Data Encryption , 2009 .
[74] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[75] Gustavo Alonso,et al. Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.
[76] Tony Greicius. Managing the Deluge of 'Big Data' From Space , 2015 .
[77] Ben Shneiderman,et al. Analyzing Social Media Networks with NodeXL: Insights from a Connected World , 2010 .
[78] Felix Naumann,et al. The Stratosphere platform for big data analytics , 2014, The VLDB Journal.
[79] Jorge Bernardino,et al. Choosing the right NoSQL database for the job: a quality attribute evaluation , 2015, Journal of Big Data.
[80] Michael Hausenblas,et al. Apache Drill: Interactive Ad-Hoc Analysis at Scale , 2013, Big Data.
[81] Syed Akhter Hossain,et al. NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison , 2013, ArXiv.
[82] Ramakrishna Varadarajan,et al. The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..
[83] Scott Shenker,et al. Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.
[84] Liang Lin,et al. Tenzing a SQL implementation on the MapReduce framework , 2011, Proc. VLDB Endow..
[85] Cheng Soon Ong,et al. Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .
[86] Jimeng Sun,et al. Big data analytics for healthcare , 2013, KDD.
[87] Sherif Sakr,et al. The family of mapreduce and large-scale data processing systems , 2013, CSUR.
[88] Wei Fan,et al. Mining big data: current status, and forecast to the future , 2013, SKDD.
[89] Bing Liu,et al. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.
[90] Daniel E. O'Leary,et al. Big Data and Privacy: Emerging Issues , 2015, IEEE Intelligent Systems.
[91] Omer F. Rana,et al. Modelling Performance & Resource Management in Kubernetes , 2016, 2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC).
[92] Christophe Nicolle,et al. Understandable Big Data: A survey , 2015, Comput. Sci. Rev..
[93] Cong Wang,et al. Twitter Heron: Towards Extensible Streaming Engines , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).
[94] Rajkumar Buyya,et al. Big Data computing and clouds: Trends and future directions , 2013, J. Parallel Distributed Comput..
[95] Véronique Van Vlasselaer,et al. Fraud Analytics: A Broader Perspective , 2015 .
[96] Yon Dohn Chung,et al. Parallel data processing with MapReduce: a survey , 2012, SGMD.
[97] John Krumm,et al. User-Generated Content , 2008, IEEE Pervasive Comput..
[98] Seref Sagiroglu,et al. Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).
[99] Mazliza Othman,et al. Internet of Things security: A survey , 2017, J. Netw. Comput. Appl..
[100] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.
[101] Mohsen Guizani,et al. Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications , 2015, IEEE Communications Surveys & Tutorials.
[102] Tridib Mukherjee,et al. An Availability Analysis Approach for Deployment Configurations of Containers , 2021, IEEE Transactions on Services Computing.
[103] Sherif Sakr,et al. Big Data 2.0 Processing Systems: Taxonomy and Open Challenges , 2016, Journal of Grid Computing.
[104] Rajkumar Buyya,et al. The anatomy of big data computing , 2015, Softw. Pract. Exp..
[105] Nanning Zheng,et al. Knowledge Engineering With Big Data (BigKE): A 54-Month, 45-Million RMB, 15-Institution National Grand Project , 2017, IEEE Access.
[106] Rajiv Ranjan,et al. A note on software tools and technologies for delivering smart media-optimized big data applications in the cloud , 2015, Computing.
[107] Seif Haridi,et al. Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..
[108] N. B. Anuar,et al. The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..
[109] Pawel Terlecki,et al. An analytic data engine for visualization in tableau , 2011, SIGMOD '11.
[110] Sergei D. Kuznetsov,et al. NoSQL data management systems , 2014, Programming and Computer Software.
[111] Robert J. Meijer,et al. Dynamically Scaling Apache Storm for the Analysis of Streaming Data , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.
[112] A. Psyllidis,et al. A Platform for Urban Analytics and Semantic Data Integration in City Planning , 2015, CAAD Futures.
[113] Xiangfeng Wang,et al. Machine learning for Big Data analytics in plants. , 2014, Trends in plant science.
[114] Murtaza Haider,et al. Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..
[115] Santo Fortunato,et al. Community detection in graphs , 2009, ArXiv.
[116] Pangfeng Liu,et al. Kylin: An efficient and scalable graph data processing system , 2013, 2013 IEEE International Conference on Big Data.
[117] Marcin Zukowski,et al. MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.
[118] In Lee,et al. Big data: Dimensions, evolution, impacts, and challenges , 2017 .
[119] Viju Raghupathi,et al. An Overview of Health Analytics , 2013 .
[120] Nascif A. Abousalh-Neto,et al. Big data exploration through visual analytics , 2012, IEEE VAST.
[121] Chris Mattmann,et al. Computing: A vision for data science , 2013, Nature.
[122] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.
[123] Reynold Xin,et al. GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.
[124] Edward Sciore,et al. SimpleDB: a simple java-based multiuser syst for teaching database internals , 2007, SIGCSE.
[125] Mahadev Konar,et al. ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.
[126] Soon Ae Chun,et al. Linking and using social media data for enhancing public health analytics , 2017, J. Inf. Sci..
[127] Holger Ziekow,et al. Towards a Big Data Analytics Framework for IoT and Smart City Applications , 2015 .
[128] Veda C. Storey,et al. Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..
[129] Lawrence B. Holder,et al. Mining Graph Data: Cook/Mining Graph Data , 2006 .
[130] Lihua Huang,et al. E-business Ecosystem and its Evolutionary Path: The Case of the Alibaba Group in China , 2010, Pac. Asia J. Assoc. Inf. Syst..
[131] Jay Kreps,et al. Kafka : a Distributed Messaging System for Log Processing , 2011 .
[132] Christian Darabos,et al. The multiscale backbone of the human phenotype network based on biological pathways , 2014, BioData Mining.
[133] Michael Isard,et al. TidyFS: A Simple and Small Distributed File System , 2011, USENIX Annual Technical Conference.
[134] C. L. Philip Chen,et al. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..
[135] Michael Stonebraker,et al. A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.
[136] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[137] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.
[138] Niko Pollner,et al. A Data Center Infrastructure Monitoring Platform Based on Storm and Trident , 2017, BTW.
[139] Ian Gorton,et al. The Changing Paradigm of Data-Intensive Computing , 2009, Computer.
[140] Réka Albert,et al. Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.
[141] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[142] Rachid Guerraoui,et al. Fault-Tolerance by Replication in Distributed Systems , 1996, Ada-Europe.
[143] Yunhao Liu,et al. Big Data: A Survey , 2014, Mob. Networks Appl..
[144] Zheng Shao,et al. Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).
[145] Alexandros Labrinidis,et al. Challenges and Opportunities with Big Data , 2012, Proc. VLDB Endow..
[146] María José del Jesús,et al. Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks , 2014, WIREs Data Mining Knowl. Discov..
[147] Umar Farooq Minhas,et al. SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures , 2014, Proc. VLDB Endow..
[148] Rein Ahas,et al. Measuring tourism destinations using mobile tracking data , 2016 .
[149] Dirk Neumann,et al. Bringing Analytics into Practice: Evidence from the Power Sector , 2016, ICIS.
[150] Xindong Wu,et al. Knowledge Engineering with Big Data , 2015, IEEE Intell. Syst..
[151] David B Nash. Harnessing the power of big data in healthcare. , 2014, American health & drug benefits.
[152] Athanasios V. Vasilakos,et al. Big data: From beginning to future , 2016, Int. J. Inf. Manag..
[153] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[154] K. Morik. A Survey of the Stream Processing Landscape , 2014 .
[155] María S. Pérez-Hernández,et al. Spark Versus Flink: Understanding Performance in Big Data Analytics Frameworks , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).
[156] Philippe Dobbelaere,et al. Kafka versus RabbitMQ: A comparative study of two industry reference publish/subscribe implementations: Industry Paper , 2017, DEBS.
[157] Daniel A. Keim,et al. Visual analytics for the big data era — A comparative review of state-of-the-art commercial systems , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).
[158] Sanming Zhou,et al. Networking for Big Data: A Survey , 2017, IEEE Communications Surveys & Tutorials.
[159] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.
[160] Beng Chin Ooi,et al. In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.
[161] Kyumars Sheykh Esmaili,et al. Kafka versus RabbitMQ , 2017, ArXiv.
[162] Barbara Martini,et al. The Data Revolution. Big Data, Open Data, Data Infrastructures and Their Consequences , 2016 .
[163] Véronique Van Vlasselaer,et al. Fraud Analytics : Using Descriptive, Predictive, and Social Network Techniques:A Guide to Data Science for Fraud Detection , 2015 .
[164] Hyeontaek Lim,et al. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.
[165] David Taniar,et al. Sensor data management in the cloud: Data storage, data ingestion, and data retrieval , 2018, Concurr. Comput. Pract. Exp..
[166] Xavier Franch,et al. A software reference architecture for semantic-aware Big Data systems , 2017, Inf. Softw. Technol..
[167] Zhenlong Li,et al. Big Data and cloud computing: innovation opportunities and challenges , 2017, Int. J. Digit. Earth.
[168] Vijay V. Raghavan,et al. NoSQL Systems for Big Data Management , 2014, 2014 IEEE World Congress on Services.
[169] Werner Vogels,et al. Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .
[170] Gheorghe Matei,et al. Column-Oriented Databases, an Alternative for Analytical Environment , 2010 .
[171] Lawrence B. Holder,et al. Mining Graph Data , 2006 .
[172] Vipin Kumar,et al. Trends in big data analytics , 2014, J. Parallel Distributed Comput..
[173] Xiaoyong Du,et al. A Study of SQL-on-Hadoop Systems , 2014, BPOE@ASPLOS/VLDB.
[174] A. Lo,et al. A Survey of Systemic Risk Analytics , 2012 .
[175] S. Fawcett,et al. Data Science, Predictive Analytics, and Big Data: A Revolution that Will Transform Supply Chain Design and Management , 2013 .
[176] Jignesh M. Patel,et al. Storm@twitter , 2014, SIGMOD Conference.
[177] Trey Ideker,et al. Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..
[178] Cees T. A. M. de Laat,et al. Defining architecture components of the Big Data Ecosystem , 2014, 2014 International Conference on Collaboration Technologies and Systems (CTS).
[179] Jason J. Jung,et al. Social big data: Recent achievements and new challenges , 2015, Information Fusion.
[180] Ayoub Ait Lahcen,et al. Big Data technologies: A survey , 2017, J. King Saud Univ. Comput. Inf. Sci..
[181] Arun Sharma,et al. Scalable machine‐learning algorithms for big data analytics: a comprehensive review , 2016, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..
[182] Antonio Iera,et al. The Internet of Things: A survey , 2010, Comput. Networks.
[183] Gang Chen,et al. MemepiC: Towards a Unified In-Memory Big Data Management System , 2019, IEEE Transactions on Big Data.
[184] Prashant Malik,et al. Cassandra: a decentralized structured storage system , 2010, OPSR.