论文信息 - Collective Principal Component Analysis from Distributed, Heterogeneous Data

Collective Principal Component Analysis from Distributed, Heterogeneous Data

Principal component analysis (PCA) is a statistical technique to identify the dependency structure of multivariate stochastic observations. PCA is frequently used in data mining applications. This paper considers PCA in the context of the emerging network-based computing environments. It offers a technique to perform PCA from distributed and heterogeneous data sets with relatively small communication overhead. The technique is evaluated against different data sets, including a data set for a web mining application. This approach is likely to facilitate the development of distributed clustering, associative link analysis, and other heterogeneous data mining applications that frequently use PCA.

[1] Salvatore J. Stolfo,et al. Sharing Learned Models among Remote Database Partitions by Local Meta-Learning , 1996, KDD.

[2] James Demmel,et al. LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.

[3] Gene H. Golub,et al. Matrix computations , 1983 .

[4] J. E. Jackson. A User's Guide to Principal Components , 1991 .

[5] Christos Faloutsos,et al. Quantifiable data mining using principal component analysis , 1997 .