HiPerData: An autonomous large-scale model building and management platform for big data analytics

Data mining is a difficult task that relies on an exploratory and analytic process of processing large quantities of data in order to discover meaningful patterns for valuable insights. The increasing heterogeneity and complexity of data requires expert knowledge on how to combine multiple data mining techniques to process and analyze the data in an effective and efficient way. This paper presents a distributed architecture, HiPerData, for automated data processing and mining using large-scale computational resource management, model building and selection, and predictive and inference analysis. We illustrate two data mining tasks in which we automate the data mining knowledge flow construction based on the use of standards that have been defined in both data mining and automated-planning communities.

[1]  Radu Prodan,et al.  A Hybrid Intelligent Method for Performance Modeling and Prediction of Workflow Activities in Grids , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[2]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[3]  Radu Prodan,et al.  Run-time Optimisation of Grid Workflow Applications , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[4]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[5]  Alok R. Chaturvedi,et al.  Integrated Modeling Environments in Organizations: An Empirical Study , 1998, Inf. Syst. Res..

[6]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[7]  Ling Liu,et al.  Purlieus: Locality-aware resource allocation for MapReduce in a cloud , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[8]  Hartmut J. Will,et al.  Model management systems , 1975 .

[9]  Philip A. Bernstein,et al.  Applying Model Management to Classical Meta Data Problems , 2003, CIDR.

[10]  Rajkumar Buyya,et al.  The Gridbus toolkit for service oriented grid and utility computing: an overview and status report , 2004, 1st IEEE International Workshop on Grid Economics and Business Models, 2004. GECON 2004..

[11]  Bing Liu,et al.  Managing large collections of data mining models , 2008, CACM.

[12]  Weimin Xiao,et al.  Rule interestingness analysis using OLAP operations , 2006, KDD '06.