Query Optimization using SQL Approach for Data Mining Analysis

Relational databases are acceptable repository for structured data; integrating data mining algorithms with a relational DBMS is an essential research issue for database programmers. In a relational database, a significant effort is required to prepare a summary data set that can be used as input for the data mining process. It requires many complex SQL queries, joining tables and aggregating columns. This paper realizes the research on extending SQL code for data mining processing and related work on query optimization. Also the paper proposes the following approaches, transposition, pivoting and cross tabulation. The approaches exhibit efficient optimizations with SQL extensions using aggregated Queries.

[1]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[2]  Sally I. McClean,et al.  Aggregation of Imprecise and Uncertain Information in Databases , 2001, IEEE Trans. Knowl. Data Eng..

[3]  Philip S. Yu,et al.  Efficient classification across multiple database relations: a CrossMine approach , 2006, IEEE Transactions on Knowledge and Data Engineering.

[4]  OrdonezCarlos Integrating K-Means Clustering with a Relational DBMS Using SQL , 2006 .

[5]  Surajit Chaudhuri,et al.  Integrating data mining with SQL databases: OLE DB for data mining , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Carlos Ordonez,et al.  Bayesian Classifiers Programmed in SQL , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Hendrik Decker,et al.  Inconsistency-Tolerant Integrity Checking , 2011, IEEE Transactions on Knowledge and Data Engineering.

[8]  Carlos Ordonez,et al.  Statistical Model Computation with UDFs , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  Carlos Ordonez,et al.  Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis , 2012, IEEE Transactions on Knowledge and Data Engineering.

[10]  Abhinav Gupta,et al.  Spreadsheets in RDBMS for OLAP , 2003, SIGMOD '03.

[11]  Goetz Graefe,et al.  PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS , 2004, VLDB.

[12]  Carlos Ordonez,et al.  Efficient disk-based K-means clustering for relational databases , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Elena Baralis,et al.  IMine: Index Support for Item Set Mining , 2009, IEEE Transactions on Knowledge and Data Engineering.

[14]  Carlo Zaniolo,et al.  ATLAS: A Small but Complete SQL Extension for Data Mining and Data Streams , 2003, VLDB.

[15]  Carlos Garcia-Alvarado,et al.  Efficient Distance Computation Using SQL Queries and UDFs , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[16]  Carlos Ordonez,et al.  Integrating K-means clustering with a relational DBMS using SQL , 2006, IEEE Transactions on Knowledge and Data Engineering.

[17]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[18]  Ramón Alberto Carrasco,et al.  dmFSQL: a Language for Data Mining , 2006, 17th International Workshop on Database and Expert Systems Applications (DEXA'06).