Spark: A Big Data Processing Platform Based on Memory Computing

Spark is a memory-based computing framework which has a better ability of computing and fault tolerance, supports batch, interactive, iterative and flow calculations. In this paper, we analyze the Spark's primary framework, core technologies, and point out the advantages and disadvantages of the Spark. In the end, we make a discussion for the future trends of the Spark technologies.

[1]  Alexey Solovyev,et al.  SPARK: a framework for multi-scale agent-based biomedical modeling , 2010, SpringSim.

[2]  Michael Franklin Making Sense of Big Data with the Berkeley Data Analytics Stack , 2015, WSDM.

[3]  Amit Chakrabarti,et al.  Cs85: Data Stream Algorithms Lecture Notes, Fall 2009 Contents 0 Preliminaries: the Data Stream Model 4 , 2009 .

[4]  James Hendler,et al.  Google’s PageRank and Beyond: The Science of Search Engine Rankings , 2007 .

[5]  Alexey Solovyev,et al.  SPARK: a framework for multi-scale agent-based biomedical modeling , 2010, SpringSim.

[6]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[7]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[8]  Seokil Song,et al.  Distributed Indexing Methods for Moving Objects based on Spark Stream , 2015 .

[9]  Mathews Jacob,et al.  Subspace based low rank and joint sparse matrix recovery , 2014, ArXiv.

[10]  Joseph Silk,et al.  A STUDY OF HIGH-ORDER NON-GAUSSIANITY WITH APPLICATIONS TO MASSIVE CLUSTERS AND LARGE VOIDS , 2010, 1007.1230.

[11]  Cecilia Possanzini,et al.  Scalability and channel independency of the digital broadband dStream architecture , 2011 .

[12]  Michael J. Franklin Making sense of big data with the Berkeley data analytics stack , 2013, SSDBM.

[13]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[14]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[15]  Peter Boncz,et al.  First International Workshop on Graph Data Management Experiences and Systems , 2013, SIGMOD 2013.

[16]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[17]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[18]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[19]  R. S. J. Sparks,et al.  Global link between deformation and volcanic eruption quantified by satellite imagery , 2014, Nature Communications.

[20]  Tom M. Mitchell,et al.  Machine learning classifiers and fMRI: A tutorial overview , 2009, NeuroImage.

[21]  Mathews Jacob,et al.  Subspace based low rank & joint sparse matrix recovery , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[22]  Scott D. Gronlund,et al.  Evaluating Eyewitness Identification Procedures Using Receiver Operating Characteristic Analysis , 2014 .

[23]  Olga V. Demler,et al.  Equivalence of improvement in area under ROC curve and linear discriminant analysis coefficient under assumption of normality , 2011, Statistics in medicine.

[24]  Alessio Conese,et al.  Inferring latent user attributes in streams on multimodal social data using spark , 2015 .

[25]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[26]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[27]  Huasha Zhao,et al.  High Performance Machine Learning through Codesign and Rooflining , 2014 .

[28]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[29]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[30]  Zhengdong Wang,et al.  On-Line Structural Damage Feature Extraction Based on Autoregressive Statistical Pattern of Time Series , 2014 .