The Astrolabe Project: Identifying and Curating Astronomical Dark Data through Development of Cyberinfrastructure Resources

As research datasets and analyses grow in complexity, data that could be valuable to other researchers and to support the integrity of published work remain uncurated across disciplines. These data are especially concentrated in the Long Tail of funded research, where curation resources and related expertise are often inaccessible. In the domain of astronomy, it is undisputed that uncurated dark data exist, but the scope of the problem remains uncertain. The Astrolabe Project is a collaboration between University of Arizona researchers, the CyVerse cyberinfrastructure environment, and American Astronomical Society, with a mission to identify and ingest previously-uncurated astronomical data, and to provide a robust computational environment for analysis and sharing of data, as well as services for authors wishing to deposit data associated with publications. Following expert feedback obtained through two workshops held in 2015 and 2016, Astrolabe is funded in part by National Science Foundation. The system is being actively developed within CyVerse, and Astrolabe collaborators are soliciting heterogeneous datasets and potential users for the prototype system. Astrolabe team members are currently working to characterize the properties of uncurated astronomical data, and to develop automated methods for locating potentially-useful data to be targeted for ingest into Astrolabe, while cultivating a user community for the new data management system.

[1]  Sue B. Silver,et al.  Frontiers in Ecology and the Environment , 2006 .

[2]  Wiley Interscience Journal of the American Society for Information Science and Technology , 2013 .

[3]  J. G. Jernigan,et al.  Astronomical Data Analysis Software and Systems XX , 2011 .

[4]  Anthony J. G. Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery [Point of View] , 2011 .

[5]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[6]  Edwin Henneken It specialist Unlocking and sharing data in astronomy , 2015 .

[7]  J. Greenberg Big Data, Little Data, No Data: Scholarship in the Networked World , 2016 .

[8]  Matthew S. Mayernik,et al.  An Exploration of the Life Cycle of eScience Collaboratory Data , 2008 .

[9]  Jane Greenberg Theoretical Considerations of Lifecycle Modeling: An Analysis of the Dryad Repository Demonstrating Automatic Metadata Propagation, Inheritance, and Value System Adoption , 2009 .

[10]  A. Budden,et al.  Big data and the future of ecology , 2013 .

[11]  Christine L. Borgman,et al.  The conundrum of sharing research data , 2012, J. Assoc. Inf. Sci. Technol..

[12]  Peter T. Darch,et al.  Beyond Big or Little Science: Understanding Data Lifecycles in Astronomy and the Deep Subseafloor Biosphere , 2015 .

[13]  Edwin A. Henneken Unlocking and sharing data in astronomy , 2015 .

[14]  P. Bryan Heidorn,et al.  Shedding Light on the Dark Data in the Long Tail of Science , 2008, Libr. Trends.

[15]  Helmut Krcmar,et al.  Big Data , 2014, Wirtschaftsinf..

[16]  Mike Conway,et al.  Cross-institutional research cyberinfrastructure for data intensive science , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).