Decision support systems form the core of business IT infrastructures because they let companies translate business information into tangible and lucrative results. Collecting, maintaining, and analyzing large amounts of data, however, involves expensive technical challenges that require organizational commitment. Many commercial tools are available for each of the three major data warehousing tasks: populating the data warehouse from independent operational databases, storing and managing the data, and analyzing the data to make intelligent business decisions. Data cleaning relates to heterogeneous data integration, a problem studied for many years. More work must be done to develop domain-independent tools that solve the data cleaning problems associated with data warehouse development. Most data mining research has focused on developing algorithms for building more accurate models or building models faster. However, data preparation and mining model deployment present several engaging problems that relate specifically to achieving better synergy between database systems and data mining technology.
[1]
Robert Barnes,et al.
Loading databases using dataflow parallelism
,
1994,
SGMD.
[2]
Salvatore J. Stolfo,et al.
The merge/purge problem for large databases
,
1995,
SIGMOD '95.
[3]
Jeffrey F. Naughton,et al.
On the Computation of Multidimensional Aggregates
,
1996,
VLDB.
[4]
Jeffrey D. Ullman,et al.
Implementing data cubes efficiently
,
1996,
SIGMOD '96.
[5]
Jiawei Han,et al.
OLAP Mining: Integration of OLAP with Data Mining
,
1997,
DS-7.
[6]
Johannes Gehrke,et al.
Mining Very Large Databases
,
1999,
Computer.
[7]
Jiawei Han,et al.
Data Mining: Concepts and Techniques
,
2000
.
[8]
Dennis Shasha,et al.
Declarative Data Cleaning: Language, Model, and Algorithms
,
2001,
VLDB.