Big data applications in engineering and science

Research to solve engineering and science problems commonly require the collection and complex analysis of a vast amount of data. This makes them a natural exemplar of big data applications. For example, data from weather stations, high resolution images from CT scans, or data captured by astronomical instruments all easily showcase one or more big data characteristics, i.e., volume, velocity, variety and veracity. These big data characteristics present computational and analytical challenges that need to be overcame in order to deliver engineering solutions or make scientific discoveries. In this chapter, we catalogued engineering and science problems that carry a big data angle. We will also discuss the research advances for these problems and present a list of tools available to the practitioner. A number of big data application exemplars from the past works of the authors are discussed with further depth, highlighting the association of the specific problem and its big data characteristics. The overview from these various perspectives will provide the reader an up-to-date audit of big data developments in engineering and science.

[1]  Joshua A.T. Fairfield,et al.  Big Data, Big Problems: Emerging Issues in the Ethics of Data Science and Journalism , 2014 .

[2]  Peter E. Thornton,et al.  Big data visual analytics for exploratory earth system simulation analysis , 2013, Comput. Geosci..

[3]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[4]  Kirk D. Borne,et al.  Scientific Data Mining in Astronomy , 2009, Next Generation of Data Mining.

[5]  Haimonti Dutta,et al.  Distributed Top-K Outlier Detection from Astronomy Catalogs using the DEMAC System , 2007, SDM.

[6]  Herbert F. Jelinek,et al.  An innovative Multi-disciplinary Diabetes Complications Screening Program in a Rural Community: A Description and Preliminary Results of the Screening , 2006 .

[7]  Dirk U Pfeiffer,et al.  Sources of spatial animal and human health data: Casting the net wide to deal more effectively with increasingly complex disease problems , 2015, Spatial and Spatio-temporal Epidemiology.

[8]  Dean N. Williams,et al.  Data-Intensive Science in the US DOE: Case Studies and Future Challenges , 2011, Computing in Science & Engineering.

[9]  Salvatore Venticinque,et al.  Big Data Processing for Pervasive Environment in Cloud Computing , 2014, 2014 International Conference on Intelligent Networking and Collaborative Systems.

[10]  Peter Baumann,et al.  Big Data Analytics for Earth Sciences: the EarthServer approach , 2016, Int. J. Digit. Earth.

[11]  Tom Ziemke,et al.  On the Definition of Information Fusion as a Field of Research , 2007 .

[12]  Dieter Fensel,et al.  It's a Streaming World! Reasoning upon Rapidly Changing Information , 2009, IEEE Intelligent Systems.

[13]  Julio J. Valdés,et al.  Time dependent neural network models for detecting changes of state in complex processes: Applications in earth sciences and astronomy , 2006, Neural Networks.

[14]  Robert Shorten,et al.  A big-data model for multi-modal public transportation with application to macroscopic control and optimisation , 2015, Int. J. Control.

[15]  Huan-Chao Keh,et al.  Big Data Generation: Application of Mobile Healthcare , 2014, PAKDD Workshops.

[16]  Sander Dieleman,et al.  Rotation-invariant convolutional neural networks for galaxy morphology prediction , 2015, ArXiv.

[17]  Vijay V. Raghavan,et al.  Web information fusion: A review of the state of the art , 2008, Inf. Fusion.

[18]  Mark H. Hansen,et al.  Participatory Sensing: A Citizen-Powered Approach to Illuminating the Patterns that Shape our World , 2009 .

[19]  Hui Lin,et al.  A data mining approach for heavy rainfall forecasting based on satellite image sequence analysis , 2007, Comput. Geosci..

[20]  Lieven Claessens,et al.  Creating long-term weather data from thin air for crop simulation modeling , 2015 .

[21]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[22]  Darcy A. Davis,et al.  Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework , 2013, Journal of General Internal Medicine.

[23]  Huadong Guo Digital Earth: Big Earth Data , 2014, Int. J. Digit. Earth.

[24]  Randal E. Bryant,et al.  Data-Intensive Scalable Computing for Scientific Applications , 2011, Computing in Science & Engineering.

[25]  Fakhri Karray,et al.  Multisensor data fusion: A review of the state-of-the-art , 2013, Inf. Fusion.

[26]  Xabier Artola,et al.  Big data for Natural Language Processing: A streaming approach , 2015, Knowl. Based Syst..

[27]  Merja Mahrt,et al.  The Value of Big Data in Digital Media Research , 2013 .

[28]  J. Schraml On-line and real-time processing in radio astronomy , 1978 .

[29]  Tongyu Zhu,et al.  RTIC-C: A Big Data System for Massive Traffic Information Mining , 2013, 2013 International Conference on Cloud Computing and Big Data.

[30]  E. Sivaraman,et al.  High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing Using Hadoop , 2014, 2014 International Conference on Intelligent Computing Applications.

[31]  Riccardo Bellazzi,et al.  Intelligent analysis of clinical time series: an application in the diabetes mellitus domain , 2000, Artif. Intell. Medicine.

[32]  Robert A. Weinstein,et al.  Application of Information Technology: Development of a Clinical Data Warehouse for Hospital Infection Control , 2003, J. Am. Medical Informatics Assoc..

[33]  Christine Bichsel,et al.  Liquid Challenges: Contested Water in Central Asia , 2012 .

[34]  Krista G. Hilchey,et al.  A review of citizen science and community-based environmental monitoring: issues and opportunities , 2011, Environmental monitoring and assessment.

[35]  Ciprian Dobre,et al.  Intelligent services for Big Data science , 2014, Future Gener. Comput. Syst..

[36]  Manuel de Buenaga Rodríguez,et al.  Chronic Patients Monitoring Using Wireless Sensors and Big Data Processing , 2014, 2014 Eighth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.

[37]  Susan C. Weber,et al.  STRIDE - An Integrated Standards-Based Translational Research Informatics Platform , 2009, AMIA.

[38]  Hsinchun Chen,et al.  DiabeticLink: A Health Big Data System for Patient Empowerment and Personalized Healthcare , 2013, ICSH.

[39]  Yanxia Zhang,et al.  Astronomy in the Big Data Era , 2015, Data Sci. J..

[40]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[41]  Rafael S. de Souza,et al.  AMADA - Analysis of multidimensional astronomical datasets , 2015, Astron. Comput..

[42]  Jianqiang Li,et al.  Emerging information technologies for enhanced healthcare , 2015, Comput. Ind..

[43]  Anwar M. Ghuloum,et al.  ViewpointFace the inevitable, embrace parallelism , 2009, CACM.

[44]  H. Zheng,et al.  Feature selection for high dimensional data in astronomy , 2007, 0709.0138.

[45]  Kok-Leong Ong,et al.  Participatory sensing and education: Helping the community mitigate sleep disturbance from traffic noise , 2014, Int. J. Pervasive Comput. Commun..

[46]  Qi Shi,et al.  Big Data applications in real-time traffic operation and safety monitoring and improvement on urban expressways , 2015 .

[47]  Hisham Elkadi,et al.  Effects of exposure to traffic noise on health , 2012 .

[48]  Marie-Christine Chambrin,et al.  A New Approach to the Abstraction of Monitoring Data in Intensive Care , 2005, AIME.

[49]  Toshiyuki Imamura,et al.  The 10,240‐member ensemble Kalman filtering with an intermediate AGCM , 2014 .

[50]  George K. Karagiannidis,et al.  Efficient Machine Learning for Big Data: A Review , 2015, Big Data Res..

[51]  Elpida T. Keravnou Temporal Abstraction of Medical Data: Deriving Periodicity , 1997 .

[52]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[53]  Raymond Y. K. Lau,et al.  Demystifying Big Data Analytics for Business Intelligence Through the Lens of Marketing Mix , 2015, Big Data Res..

[54]  Jun Wang,et al.  "City Intelligent Energy and Transportation Network Policy" "Based on the Big Data Analysis" , 2014, ANT/SEIT.

[55]  Xinghuo Yu,et al.  Smart Electricity Meter Data Intelligence for Future Energy Systems: A Survey , 2016, IEEE Transactions on Industrial Informatics.

[56]  Michel Krämer,et al.  A modular software architecture for processing of big geospatial data in the cloud , 2015, Comput. Graph..

[57]  Lior Shamir,et al.  Galaxy morphology - An unsupervised machine learning approach , 2015, Astron. Comput..

[58]  V. Torra On some aggregation operators for numerical information , 2003 .

[59]  Ralph Schroeder,et al.  Big data and Wikipedia research: social science knowledge across disciplinary divides , 2015 .

[60]  Rajiv Ranjan,et al.  G-Hadoop: MapReduce across distributed data centers for data-intensive computing , 2013, Future Gener. Comput. Syst..

[61]  Stavri G. Nikolov,et al.  Image fusion: Advances in the state of the art , 2007, Inf. Fusion.

[62]  Daswin De Silva,et al.  Development of User Warrant Ontology for Improving Online Health Information Provision , 2013, ACIS.

[63]  Stefano Nativi,et al.  Big Data challenges in building the Global Earth Observation System of Systems , 2015, Environ. Model. Softw..

[64]  Alexander S. Szalay,et al.  Extreme Data-Intensive Scientific Computing , 2011, Computing in Science & Engineering.

[65]  Li Li,et al.  Robust causal dependence mining in big data network and its application to traffic flow predictions , 2015 .

[66]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[67]  Bryan C. Pijanowski,et al.  A big data urban growth simulation at a national scale: Configuring the GIS and neural network based Land Transformation Model to run in a High Performance Computing (HPC) environment , 2014, Environ. Model. Softw..

[68]  Jeffrey Heer,et al.  Graphical Histories for Visualization: Supporting Analysis, Communication, and Evaluation , 2008, IEEE Transactions on Visualization and Computer Graphics.

[69]  Carolyn McGregor,et al.  Temporal abstraction in intelligent clinical data analysis: A survey , 2007, Artif. Intell. Medicine.

[70]  Md. Rafiqul Islam,et al.  Evolutionary optimization: A big data perspective , 2016, J. Netw. Comput. Appl..

[71]  Shijie Cheng,et al.  Technical aspects and case study of big data based condition monitoring of power apparatuses , 2014, 2014 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC).

[72]  S. Sorooshian,et al.  Watershed rainfall forecasting using neuro-fuzzy networks with the assimilation of multi-sensor information , 2014 .

[73]  Ann L Oberg,et al.  Lessons learned in the analysis of high-dimensional data in vaccinomics. , 2015, Vaccine.

[74]  Adam Wright,et al.  A four-phase model of the evolution of clinical decision support architectures , 2008, Int. J. Medical Informatics.

[75]  Jonathan M. Teich,et al.  Grand challenges in clinical decision support , 2008, J. Biomed. Informatics.

[76]  Julie Fisher,et al.  Improving service of online health information provision: A case of usage-driven design for health information portals , 2014, Information Systems Frontiers.

[77]  Han Liu,et al.  Statistical analysis of big data on pharmacogenomics. , 2013, Advanced drug delivery reviews.

[78]  Caitlin D Cottrill,et al.  Leveraging Big Data for the Development of Transport Sustainability Indicators , 2015 .

[79]  Rahul Ramachandran,et al.  Real-time storm detection and weather forecast activation through data mining and events processing , 2008, Earth Sci. Informatics.

[80]  Eric E Schadt,et al.  Systems biology of asthma and allergic diseases: a multiscale approach. , 2015, The Journal of allergy and clinical immunology.

[81]  Wenwu Tang,et al.  Parallel map projection of vector-based big spatial data: Coupling cloud computing with graphics processing units , 2017, Comput. Environ. Urban Syst..

[82]  Francesco Palmieri,et al.  GRASP-based resource re-optimization for effective big data access in federated clouds , 2016, Future Gener. Comput. Syst..

[83]  Xue-Jie Zhang,et al.  Comparison of open-source cloud management platforms: OpenStack and OpenNebula , 2012, 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery.

[84]  Licia Capra,et al.  Urban Computing: Concepts, Methodologies, and Applications , 2014, TIST.

[85]  Miriam Horn,et al.  Mining Big Data to Transform Electricity , 2013 .

[86]  Christopher G. Chute,et al.  The Enterprise Data Trust at Mayo Clinic: a semantically integrated warehouse of biomedical data , 2010, J. Am. Medical Informatics Assoc..

[87]  Paolo Bientinesi,et al.  High performance solutions for big-data GWAS , 2014, Parallel Comput..

[88]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[89]  Keqiu Li,et al.  Big Data Processing in Cloud Computing Environments , 2012, 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks.

[90]  Roy D. Sleator,et al.  'Big data', Hadoop and cloud computing in genomics , 2013, J. Biomed. Informatics.

[91]  Robert Schmieder,et al.  Big data challenges and opportunities in high-throughput sequencing , 2013 .

[92]  H. V. Jagadish Big Data and Science: Myths and Reality , 2015, Big Data Res..

[93]  M. J. Estrela,et al.  Real-time weather forecasting in the Western Mediterranean Basin: An application of the RAMS model , 2014 .

[94]  Jagdev Bhogal,et al.  Handling Big Data Using NoSQL , 2015, 2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops.

[95]  Lian Duan,et al.  Big data analytics and business analytics , 2015 .

[96]  Jun Gao,et al.  DW4TR: A Data Warehouse for Translational Research , 2011, J. Biomed. Informatics.

[97]  R. Procter,et al.  Reading the riots on Twitter: methodological innovation for the analysis of big data , 2013 .

[98]  Simon Perkins,et al.  Scalable desktop visualisation of very large radio astronomy data cubes , 2014 .

[99]  Fabian Levihn,et al.  Big meter data analysis of the energy efficiency potential in Stockholm's building stock , 2014 .

[100]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[101]  David Banisar,et al.  Moving from Principles to Rights: Rio 2012 and Access to Information, Public Participation, and Justice , 2012 .

[102]  Zhao Li,et al.  Speeding up processing data from millions of smart meters , 2014, ICPE.

[103]  Emiliano Miluzzo,et al.  A survey of mobile phone sensing , 2010, IEEE Communications Magazine.

[104]  Ernesto Araujo,et al.  Neural network and fuzzy logic statistical downscaling of atmospheric circulation-type specific weather pattern for rainfall forecasting , 2014, Appl. Soft Comput..

[105]  Ge Yu,et al.  HaoLap: A Hadoop based OLAP system for big data , 2015, J. Syst. Softw..

[106]  Amarnath Banerjee,et al.  Clinical decision support: Converging toward an integrated architecture , 2012, J. Biomed. Informatics.

[107]  Kok-Leong Ong,et al.  2Loud?: Community mapping of exposure to traffic noise with mobile phones , 2014, Environmental Monitoring and Assessment.

[108]  Carson Kai-Sang Leung,et al.  A Data Science Solution for Mining Interesting Patterns from Uncertain Big Data , 2014, 2014 IEEE Fourth International Conference on Big Data and Cloud Computing.

[109]  F. Bodi,et al.  Managing the complexity of a telecommunication power systems equipment replacement program , 2012, Intelec 2012.

[110]  Raghunath Nambiar,et al.  Big data in genomics: An overview , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[111]  Stefan Feuerriegel,et al.  Putting Big Data analytics to work: Feature selection for forecasting electricity prices using the LASSO and random forests , 2014, J. Decis. Syst..