Data-Intensive Technologies for Cloud Computing

As a result of the continuing information explosion, many organizations are drowning in data and the resulting “data gap” or inability to process this information and use it effectively is increasing at an alarming rate. Data-intensive computing represents a new computing paradigm (Kouzes, Anderson, Elbert, Gorton, & Gracio, 2009) which can address the data gap using scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement applications previously thought to be impractical or infeasible. Cloud computing provides the opportunity for organizations with limited internal resources to implement large-scale data-intensive computing applications in a cost-effective manner.

[1]  Jim Gray,et al.  Distributed Computing Economics , 2004, ACM Queue.

[2]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[3]  Vinton G. Cerf An information avalanche , 2007, Computer.

[4]  Xavier Llorà,et al.  Meandre: Semantic-Driven Data-Intensive Flows in the Clouds , 2008, 2008 IEEE Fourth International Conference on eScience.

[5]  Maya Gokhale,et al.  Hardware Technologies for High-Performance Data-Intensive Computing , 2008, Computer.

[6]  Brian Hayes,et al.  What Is Cloud Computing? , 2019, Cloud Technologies.

[7]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[8]  Ahmar Abbas,et al.  Grid Computing: A Practical Guide to Technology and Applications , 2003 .

[9]  Serge Abiteboul,et al.  Searching Shared Content in Communities with the Data Ring , 2009, IEEE Data Eng. Bull..

[10]  Robert L. Grossman,et al.  Compute and storage clouds using wide area high performance networks , 2008, Future Gener. Comput. Syst..

[11]  David B. Skillicorn,et al.  Models and languages for parallel computation , 1998, CSUR.

[12]  Robert L. Grossman,et al.  Data mining using high performance data clouds: experimental studies using sector and sphere , 2008, KDD.

[13]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[14]  Michael Isard,et al.  Distributed aggregation for data-parallel computing: interfaces and implementations , 2009, SOSP '09.

[15]  Neal Leavitt,et al.  Anonymization Technology Takes a High Profile , 2009, Computer.

[16]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[17]  Robert L. Grossman,et al.  The Case for Cloud Computing , 2009, IT Professional.

[18]  Sandhya Dwarkadas,et al.  Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations , 2001, PPoPP '01.

[19]  Reagan Moore,et al.  Data-intensive computing , 1998 .

[20]  Jason Venner,et al.  Pro Hadoop , 2009 .

[21]  George Reese,et al.  Cloud application architectures , 2009 .

[22]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[23]  Neal Leavitt,et al.  Is Cloud Computing Really Ready for Prime Time? , 2009, Computer.

[24]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[25]  Thomas Sandholm,et al.  What's inside the Cloud? An architectural map of the Cloud landscape , 2009, 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing.

[26]  Lars S. Nyland,et al.  A Design Methodology for Data-Parallel Applications , 2000, IEEE Trans. Software Eng..

[27]  John Viega,et al.  Cloud Computing and the Common Man , 2009, Computer.

[28]  Ian Gorton,et al.  The Changing Paradigm of Data-Intensive Computing , 2009, Computer.

[29]  Rob Pike,et al.  Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[30]  Francine Berman,et al.  Got data?: a guide to data preservation in the information age , 2008, CACM.

[31]  Christopher Olston,et al.  Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience , 2009, Proc. VLDB Endow..

[32]  Robert L. Grossman,et al.  Lessons learned from a year's worth of benchmarks of large data clouds , 2009, MTAGS '09.

[33]  Alexander S. Szalay,et al.  Data-Intensive Computing in the 21st Century , 2008, Computer.

[34]  Huan Liu,et al.  GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[35]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[36]  Toby Velte,et al.  Cloud Computing, A Practical Approach , 2009 .

[37]  Eugene Agichtein,et al.  Mining reference tables for automatic text segmentation , 2004, KDD.

[38]  William E. Johnston High-speed, wide area, data intensive computing: a ten year retrospective , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[39]  Luis Rodero-Merino,et al.  A break in the clouds: towards a cloud definition , 2008, CCRV.

[40]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[41]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[42]  Sarah Scalia A Break in the Clouds , 2006 .

[43]  Paolo Bientinesi,et al.  Can cloud computing reach the top500? , 2009, UCHPC-MAW '09.

[44]  Eugene Agichtein Scaling Information Extraction to Large Document Collections , 2005, IEEE Data Eng. Bull..