Embrace the Challenges: Software Engineering in a Big Data World

The design and development of data-intensive software systems -- systems that generate, collect, store, process, analyze, query, and visualize large sets of data -- is fraught with significant challenges both technical and social. Project EPIC has been designing and developing data-intensive systems in support of crisis informatics research since Fall 2009. Our experience working on Project EPIC has provided insight into these challenges. In this paper, we share our experience working in this design space and describe the choices we made in tackling these challenges and their attendant trade-offs. We highlight the lack of developer support tools for data-intensive systems, the importance of multidisciplinary teams, the use of highly-iterative life cycles, the need for deep understanding of the frameworks and technologies used in data intensive systems, how simple operations transform into significant challenges at scale, and the paramount significance of data modeling in producing systems that are scalable, robust, and efficient.

[1]  Leysia Palen,et al.  Engineering Crowdwork for Disaster Events: The Human-Centered Development of a Lost-and-Found Tasking Environment , 2015, 2015 48th Hawaii International Conference on System Sciences.

[2]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[3]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[4]  Wanda J. Orlikowski,et al.  Learning from Notes: organizational issues in groupware implementation , 1992, CSCW '92.

[5]  Martin Fowler,et al.  NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence , 2012 .

[6]  E. Eugene Schultz,et al.  Hawaii international conference on system sciences , 1992, SGCH.

[7]  Leysia Palen,et al.  Architectural Implications of Social Media Analytics in Support of Crisis Informatics Research , 2013, IEEE Data Eng. Bull..

[8]  James H. Martin,et al.  A vision for technology-mediated support for public participation & assistance in mass emergencies & disasters , 2010 .

[9]  Kenneth M. Anderson,et al.  Design Challenges/Solutions for Environments Supporting the Analysis of Social Media Data in Crisis Informatics Research , 2015, 2015 48th Hawaii International Conference on System Sciences.

[10]  Kenneth Mark Anderson,et al.  MySQL to NoSQL: data modeling challenges in supporting scalability , 2012, SPLASH '12.

[11]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[12]  Kenneth M. Anderson,et al.  Incremental Sorting for Large Dynamic Data Sets , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.

[13]  Kenneth Mark Anderson,et al.  Design and implementation of a data analytics infrastructure in support of crisis informatics research: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[14]  Michael Cox,et al.  Application-controlled demand paging for out-of-core visualization , 1997 .