Conceptual Analysis of Big Data Using Ontologies and EER

Large amounts of "big data" are generated every day, many in a "raw" format that is difficult to analyze and mine. This data contains potential hidden meaningful concepts, but much of the data is superfluous and not of interest to the domain experts. Thus, dealing with big raw data solely by applying a set of distributed computing technologies e.g., MapReduce, BSP [Bulk Synchronous Parallel], and Spark and/or distributed storage systems, namely NoSQL, is generally not sufficient. Extracting the full knowledge that is hidden in the raw data is necessary to efficiently enable analysis and mining. The data needs to be processed to remove the superfluous parts and generate the meaningful domain-specific concepts. In this paper, we propose a framework that incorporates conceptual modeling and EER principle to effectively extract conceptual knowledge from the raw data so that mining and analysis can be applied to the extracted conceptual data.

[1]  Theodore G. Cleveland,et al.  Statistical characteristics of storm interevent time, depth, and duration for eastern New Mexico, Oklahoma, and Texas , 2006 .

[2]  David W. Embley,et al.  Big Data - Conceptual Modeling to the Rescue , 2013, ER.

[3]  V. Singh,et al.  Soil Conservation Service Curve Number (SCS-CN) Methodology , 2003 .

[4]  Aart Overeem,et al.  Rainfall depth-duration-frequency curves and their uncertainties , 2008 .

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Dino Pedreschi,et al.  Spatio-temporal Data Mining , 2008, Encyclopedia of GIS.

[7]  Ramez Elmasri,et al.  Complete storm identification algorithms from big raw rainfall data using MapReduce framework , 2013, 2013 IEEE International Conference on Big Data.

[8]  Leslie G. Valiant,et al.  A bridging model for multi-core computing , 2008, J. Comput. Syst. Sci..

[9]  Raymond M. Slade,et al.  Extreme precipitation depths for Texas, excluding the Trans-Pecos region , 1998 .

[10]  Ramez Elmasri,et al.  Extracting storm-centric characteristics from raw rainfall data for storm analysis and mining , 2012, BigSpatial '12.

[11]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[12]  Ramez Elmasri,et al.  Fundamentals of database systems (2nd ed.) , 1994 .

[13]  Ramez Elmasri,et al.  Using MapReduce to Speed Up Storm Identification from Big Raw Rainfall Data , 2013, CLOUD 2013.

[14]  Chuck Lam,et al.  Hadoop in Action , 2010 .

[15]  W. Asquith Depth-duration frequency of precipitation for Texas , 1998 .