Big Data Analytics Software

work? at defining big data, varying based on context, domain, and perspective. From the infrastructure’s perspective, big data has been defined as data with high volume, velocity, and variety (3V), and unpredictability. In this context, it has also been defined as data with some aspect that’s so large that current, typical methods can’t be used to process it.1,2 From the analytics’ perspective, big data has been defined as data so large that it contains significant low probability events that would be absent from traditional statistical sampling methods.3 From the business user’s perspective, big data represents opportunities for gaining a competitive advantage by gaining actionable intelligence.4 Each of these definitions provides descriptive and important aspects that must be supported by big data software. Borrowing from these definitions, we propose a definition for big data software as “software that supports the time-constrained processing of continuous information flows to provide actionable intelligence.” The phrase software that supports acknowledges that big data software includes both infrastructure and analytics software— these have been referred as big throughput and big analytics software, respectively.5 Infrastructure software is needed to store, retrieve, transmit, and process big data. While it’s essential to developing big data software, much of the emphasis and hype has been placed on the analytics portion of big data software. Nonetheless, our definition of big data software encompasses both types of software. The term time-constrained denotes the urgency in providing solutions. In a way, big data software shares a similar property with real-time software: late responses are wrong responses. The phrase continuous information flows generalizes the input of big data software, which has the unique properties of volume, velocity, and variety. This generalization also extends to other important information properties of big data input, such as continuity (data in motion versus data at rest). Data in motion (or data streams) W hat is big data software? How is it different than non-big-data software? Can it be engineered? Answering these questions requires

[1]  Carl E. Rasmussen,et al.  The Need for Open Source Software in Machine Learning , 2007, J. Mach. Learn. Res..

[2]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[3]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[4]  William N. Robinson A Roadmap for Comprehensive Requirements Modeling , 2010, Computer.

[5]  Longbing Cao,et al.  Domain-Driven Data Mining: Challenges and Prospects , 2010, IEEE Transactions on Knowledge and Data Engineering.

[6]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[7]  Baowen Xu,et al.  Testing and validating machine learning classifiers by metamorphic testing , 2011, J. Syst. Softw..

[8]  Charu C. Aggarwal,et al.  Detecting Recurring and Novel Classes in Concept-Drifting Data Streams , 2011, 2011 IEEE 11th International Conference on Data Mining.

[9]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[10]  Blaine Nelson,et al.  Adversarial machine learning , 2019, AISec '11.

[11]  Jie Yin,et al.  Using Social Media to Enhance Emergency Situation Awareness , 2012, IEEE Intelligent Systems.

[12]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[13]  Samuel Madden,et al.  From Databases to Big Data , 2012, IEEE Internet Comput..

[14]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[15]  Zhu Wang,et al.  Social and Community Intelligence: Technologies and Trends , 2012, IEEE Software.

[16]  Tim Kraska,et al.  Finding the Needle in the Big Data Systems Haystack , 2013, IEEE Internet Computing.

[17]  Neil A. M. Maiden Monitoring Our Requirements , 2013, IEEE Software.

[18]  Daniel E. O'Leary,et al.  Artificial Intelligence and Big Data , 2013, IEEE Intelligent Systems.

[19]  Forrest Shull,et al.  Getting an Intuition for Big Data , 2013, IEEE Softw..

[20]  Christopher Ré,et al.  Hazy: Making it Easier to Build and Maintain Big-data Analytics , 2013, CACM.

[21]  A. Bifet,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[22]  Joseph K. Liu,et al.  Toward efficient and privacy-preserving computing in big data era , 2014, IEEE Network.