Distributed Knowledge Discovery with Non Linear Dimensionality Reduction

Data mining tasks results are usually improved by reducing the dimensionality of data This improvement however is achieved harder in the case that data lay on a non linear manifold and are distributed across network nodes Although numerous algorithms for distributed dimensionality reduction have been proposed, all assume that data reside in a linear space In order to address the non-linear case, we introduce D-Isomap, a novel distributed non linear dimensionality reduction algorithm, particularly applicable in large scale, structured peer-to-peer networks Apart from unfolding a non linear manifold, our algorithm is capable of approximate reconstruction of the global dataset at peer level a very attractive feature for distributed data mining problems We extensively evaluate its performance through experiments on both artificial and real world datasets The obtained results show the suitability and viability of our approach for knowledge discovery in distributed environments.

[1]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[2]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[3]  Karl Aberer,et al.  Distributed similarity search in high dimensions using locality sensitive hashing , 2009, EDBT '09.

[4]  Li Yang,et al.  Incremental Isometric Embedding of High-Dimensional Data Using Connected Neighborhood Graphs , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Keith W. Ross,et al.  Computer networking - a top-down approach featuring the internet , 2000 .

[6]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[7]  Christos Doulkeridis,et al.  FEDRA: A Fast and Efficient Dimensionality Reduction Algorithm , 2009, SDM.

[8]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[9]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[10]  Robert A. van de Geijn,et al.  Parallelizing the QR Algorithm for the Unsymmetric Algebraic Eigenvalue Problem: Myths and Reality , 1996, SIAM J. Sci. Comput..

[11]  Hillol Kargupta,et al.  Collective Principal Component Analysis from Distributed, Heterogeneous Data , 2000, PKDD.

[12]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[13]  Christos Doulkeridis,et al.  K-Landmarks: Distributed Dimensionality Reduction for Clustering Quality Maintenance , 2006, PKDD.

[14]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[15]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[16]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[17]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[18]  Nagiza F. Samatova,et al.  Distributed Dimension Reduction Algorithms for Widely Dispersed Data , 2002, IASTED PDCS.

[19]  Quanquan Gu,et al.  Local Relevance Weighted Maximum Margin Criterion for Text Classification , 2009, SDM.