ECL/HPCC: A Unified Approach to Big Data

As a result of the continuing information explosion, many organizations are experiencing what is now called the “Big Data” problem. This results in the inability of organizations to effectively use massive amounts of their data in datasets which have grown too big to process in a timely manner. Data-intensive computing represents a new computing paradigm [26] which can address the big data problem using high-performance architectures supporting scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement new applications previously thought to be impractical or infeasible.

[1]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[2]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[3]  Reagan Moore,et al.  Data-intensive computing , 1998 .

[4]  Sandhya Dwarkadas,et al.  Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations , 2001, PPoPP '01.

[5]  David B. Skillicorn,et al.  Models and languages for parallel computation , 1998, CSUR.

[6]  Robert L. Grossman,et al.  Compute and storage clouds using wide area high performance networks , 2008, Future Gener. Comput. Syst..

[7]  Robert L. Grossman,et al.  Data mining using high performance data clouds: experimental studies using sector and sphere , 2008, KDD.

[8]  Rajkumar Buyya,et al.  High Performance Cluster Computing , 1999 .

[9]  Joseph M. Hellerstein,et al.  The declarative imperative: experiences and conjectures in distributed logic , 2010, SGMD.

[10]  Huan Liu,et al.  GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[11]  Michael Isard,et al.  Distributed aggregation for data-parallel computing: interfaces and implementations , 2009, SOSP '09.

[12]  Ian Gorton,et al.  The Changing Paradigm of Data-Intensive Computing , 2009, Computer.

[13]  Rob Pike,et al.  Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[14]  Francine Berman,et al.  Got data?: a guide to data preservation in the information age , 2008, CACM.

[15]  Ahmar Abbas,et al.  Grid Computing: A Practical Guide to Technology and Applications , 2003 .

[16]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[17]  Jim Gray,et al.  Distributed Computing Economics , 2004, ACM Queue.

[18]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[19]  M HellersteinJoseph The declarative imperative , 2010 .

[20]  Maya Gokhale,et al.  Hardware Technologies for High-Performance Data-Intensive Computing , 2008, Computer.

[21]  Christopher Olston,et al.  Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience , 2009, Proc. VLDB Endow..

[22]  Robert L. Grossman,et al.  Lessons learned from a year's worth of benchmarks of large data clouds , 2009, MTAGS '09.

[23]  Alexander S. Szalay,et al.  Data-Intensive Computing in the 21st Century , 2008, Computer.

[24]  William E. Johnston High-speed, wide area, data intensive computing: a ten year retrospective , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[25]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[26]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[27]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[28]  Lars S. Nyland,et al.  A Design Methodology for Data-Parallel Applications , 2000, IEEE Trans. Software Eng..

[29]  Eugene Agichtein Scaling Information Extraction to Large Document Collections , 2005, IEEE Data Eng. Bull..

[30]  Vinton G. Cerf An information avalanche , 2007, Computer.

[31]  Xavier Llorà,et al.  Meandre: Semantic-Driven Data-Intensive Flows in the Clouds , 2008, 2008 IEEE Fourth International Conference on eScience.

[32]  Eugene Agichtein,et al.  Mining reference tables for automatic text segmentation , 2004, KDD.