Parallel distance matrix computation for Matlab data mining

The paper presents utility functions for computing of a distance matrix, which plays a crucial role in data mining. The goal in the design was to enable operating on relatively large datasets by overcoming basic shortcoming - computing time - with an interface easy to use. The presented solution is a set of functions, which were created with emphasis on practical applicability in real life. The proposed solution is presented along the theoretical background for the performance scaling. Furthermore, different approaches of the parallel computing are analyzed, including shared memory, which is uncommon in Matlab environment.

[1]  Evangelos P. Markatos,et al.  Load Balancing vs. Locality Management in Shared-Memory Multiprocessors , 1992, ICPP.

[2]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[3]  Frederica Darema,et al.  The SPMD Model : Past, Present and Future , 2001, PVM/MPI.

[4]  K. Mani Chandy,et al.  Compositional C++: Compositional Parallel Programming , 1992, LCPC.

[5]  Evangelos P. Markatos,et al.  Shared memory vs. message passing in shared-memory multiprocessors , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[6]  Gaurav Sharma,et al.  MATLAB®: A Language for Parallel Computing , 2009, International Journal of Parallel Programming.

[7]  Michal Kozielski,et al.  Investigating human color harmony preferences using unsupervised machine learning , 2012, CGIV.

[8]  Max Bramer,et al.  Principles of Data Mining , 2013, Undergraduate Topics in Computer Science.