Dataspace Management for Large Data Sets

In an ideal case, Big Data analysis will enable us to learn relevant and interesting facts using large interconnected data sets. Dataspace support platforms and dataspace management systems have been proposed to help analysts bring together data related to the analyst’s interests. In this paper, we provide an example of such a platform. In addition to storing data and description of its characteristics, the platform supports verifying compatibility (and eventually summarizability) of the underlying data. This will help the analysts discover mistakes and prevent meaningless aggregations. As an example of utilizing the platform, we present a case of large data sets (tens of millions of observations), describe how the data sets can be used, and study the platform’s performance.

[1]  Michael A. Cusumano,et al.  Cloud computing and SaaS as new computing platforms , 2010, CACM.

[2]  Tapio Niemi,et al.  An ETL Process for OLAP Using RDF/OWL Ontologies , 2009, J. Data Semant..

[3]  P McFedries The coming data deluge [Technically Speaking] , 2011 .

[4]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[5]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[6]  Wayne L. Winston,et al.  Microsoft Excel Data Analysis and Business Modeling , 2004 .

[7]  Katja Moilanen,et al.  A visual XML dataspace approach for satisfying ad hoc information needs , 2015, J. Assoc. Inf. Sci. Technol..

[8]  Ling Chen,et al.  Practicability of Dataspace Systems , 2010, J. Digit. Content Technol. its Appl..

[9]  Peter Thanisch,et al.  Detecting summarizability in OLAP , 2014, Data Knowl. Eng..

[10]  Surajit Chaudhuri,et al.  An overview of business intelligence technology , 2011, Commun. ACM.

[11]  Marcos Antonio,et al.  iMeMex: A Platform for Personal Dataspace Management , 2006 .

[12]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[13]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[14]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[15]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[16]  David Maier,et al.  Principles of dataspace systems , 2006, PODS '06.