Enabling scientific data sharing and re-use

Higher sensor throughput has increased the demand for cyberinfrastructure, requiring those unfamiliar with large database management to acquire new skills or outsource. Some have called this shift from sensor-limited data collection the “data deluge.” As an alternative, we propose that the deluge is the result of sensor control software failing to keep pace with hardware capabilities. Rather than exploit the potential of powerful embedded operating systems and construct intelligent sensor networks that harvest higher quality data, the old paradigm (i.e. collect everything) is still dominant. To mitigate the deluge, we present an adaptive sampling algorithm based on the Nyquist-Shannon sampling theorem. We calibrate the algorithm for both data reduction and increased sampling over “hot moments,” which we define as periods of elevated signal activity, deviating from previous works which have emphasized adaptive sampling for data compression via minimization of signal reconstruction error. Under the feature extraction concept, samples drawn from user-defined events carry greater importance and effective control requires the researcher to describe the context of events in the form of both an identification heuristic (for calibration) and a real-time sampling model. This event-driven approach is important when observation is focused on intermittent dynamics. In our case study application, we develop a heuristic to identify hot moments from historical data and use it to train and evaluate the adaptive model in an offline analysis using soil moisture data. Results indicate the adaptive model is superior to uniform sampling, capable of extracting 20% to 100% more samples during hot moments at equivalent levels of overall efficiency. Research data sharing is one of the key challenges in the e-science era. IT technologies facilitate an enhanced management and sharing of research data. It is crucial to understand the current status of research data sharing in order to facilitate enhanced data sharing in the future. In this study, a conceptual model has been developed to characterize the process of data sharing and the factors which give rise to variations in data re-use. The study goes beyond a solely technical analysis and includes also psychological, social, organizational, legal and political components. The model was developed based on the literature and 21 face to face interviews with research, funding, data centre and publishing experts. It was validated by both a vigorous workshop and a further 55 structured telephone interviews. The overall model identifies sub-models of process, of context, and of drivers, barriers and enablers. These provide a comprehensive description of the factors that enable or inhibit the sharing of research data. They affect whether data are shared, how they are shared, and how successfully they are shared. Implementing the enablers will help the research community overcome the barriers to data re-use to facilitate future e-science endeavors.

[1]  Muriel Medard,et al.  Locally Adaptive Sampling , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[2]  Key Perspectives Ltd Data dimensions: disciplinary differences in research data sharing, reuse and long term viability , 2010 .

[3]  Edward Y. Chang,et al.  Adaptive sampling for sensor networks , 2004, DMSN '04.

[4]  Paul Wheatley,et al.  LIFE3: Predicting Long Term Digital Preservation Costs , 2009, iPRES.

[5]  William H. McDowell,et al.  Biogeochemical Hot Spots and Hot Moments at the Interface of Terrestrial and Aquatic Ecosystems , 2003, Ecosystems.

[6]  Paul C Hanson,et al.  Staying afloat in the sensor data deluge. , 2012, Trends in ecology & evolution.

[7]  Christine L. Borgman,et al.  Research Data: Who Will Share What, with Whom, When, and Why? , 2010 .

[8]  Brian Lavoie,et al.  Keeping Research Data Safe 2: Final Report , 2010 .

[9]  C. Tenopir,et al.  Data Sharing by Scientists: Practices and Perceptions , 2011, PloS one.

[10]  Hye Won Lee,et al.  Effective visualization for the spatiotemporal trend analysis of the water quality in the Nakdong River of Korea , 2010, Ecol. Informatics.

[11]  D. Rubinfeld Sustainable Economics for a Digital Planet: Ensuring Long-term Access to Digital Information , 2010 .

[12]  M. P. Cummings,et al.  Data sharing in ecology and evolution. , 2005, Trends in ecology & evolution.

[13]  Nicholas R. Jennings,et al.  Computational-Mechanism Design: A Call to Arms , 2003, IEEE Intell. Syst..

[14]  Wendy W. Chapman,et al.  A review of journal policies for sharing research data , 2008, ELPUB.

[15]  Maxim A. Batalin,et al.  Multiscale Sensing: A new paradigm for actuated sensing of high frequency dynamic phenomena , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  A. J. Jerri The Shannon sampling theorem—Its various extensions and applications: A tutorial review , 1977, Proceedings of the IEEE.

[17]  Lionel Sacks,et al.  Adaptive Sampling Mechanisms in Sensor Networks , 2003 .

[18]  C.E. Shannon,et al.  Communication in the Presence of Noise , 1949, Proceedings of the IRE.

[19]  Dirk Pilat,et al.  OECD Principles and Guidelines for Access to Research Data from Public Funding , 2007, Data Sci. J..

[20]  Stephen D. Sebestyen,et al.  Hot Spots and Hot Moments in Riparian Zones: Potential for Improved Water Quality Management 1 , 2010 .

[21]  Anders M. Dale,et al.  Towards effective and rewarding data sharing , 2003, Neuroinformatics.

[22]  Giuseppe Anastasi,et al.  Energy management in wireless sensor networks with energy-hungry sensors , 2009, IEEE Instrumentation & Measurement Magazine.

[23]  Marimuthu Palaniswami,et al.  Energy-efficient data acquisition by adaptive sampling for wireless sensor networks , 2008, IWCMC.

[24]  Richard G Baraniuk,et al.  More Is Less: Signal Processing and the Data Deluge , 2011, Science.

[25]  J. Ioannidis,et al.  Public Availability of Published Research Data in High-Impact Journals , 2011, PloS one.

[26]  Panos J. Antsaklis,et al.  Wireless Sensor Networks for Structural Health Monitoring: A Multi-Scale Approach , 2006 .

[27]  Jeremy P. Birnholtz,et al.  Data at work: supporting sharing in science and engineering , 2003, GROUP.