Astro-WISE: Chaining to the Universe

The recent explosion of recorded digital data and its processed derivatives threatens to overwhelm researchers when analysing their experimental data or looking up data items in archives and file systems. While current hardware developments allow the acquisition, processing and storage of hundreds of terabytes of data at the cost of a modern sports car, the software systems to handle these data are lagging behind. This problem is very general and is well recognized by various scientific communities; several large projects have been initiated, e.g., DATAGRID/EGEE {http://www.eu-egee.org/} federates compute and storage power over the high-energy physical community, while the international astronomical community is building an Internet geared Virtual Observatory {http://www.euro-vo.org/pub/} (Padovani 2006) connecting archival data. These large projects either focus on a specific distribution aspect or aim to connect many sub-communities and have a relatively long trajectory for setting standards and a common layer. Here, we report first light of a very different solution (Valentijn & Kuijken 2004) to the problem initiated by a smaller astronomical IT community. It provides an abstract scientific information layer which integrates distributed scientific analysis with distributed processing and federated archiving and publishing. By designing new abstractions and mixing in old ones, a Science Information System with fully scalable cornerstones has been achieved, transforming data systems into knowledge systems. This break-through is facilitated by the full end-to-end linking of all dependent data items, which allows full backward chaining from the observer/researcher to the experiment. Key is the notion that information is intrinsic in nature and thus is the data acquired by a scientific experiment. The new abstraction is that software systems guide the user to that intrinsic information by forcing full backward and forward chaining in the data modelling.