Parallel Processing and Applied Mathematics

We present a range of new incremental (single-pass streaming) algorithms for incremental principal components analysis (IPCA) and show that they are more effective than exiting ones. IPCA algorithms process the columns of a matrix A one at a time and attempt to build a basis for a low-dimensional subspace that spans the dominant subspace of A. We present a unified framework for IPCA algorithms, show that many existing ones are parameterizations of it, propose new sophisticated algorithms, and show that both the new algorithms and many existing ones can be implemented more efficiently than was previously known. We also show that many existing algorithms can fail even in easy cases and we show experimentally that our new algorithms outperform existing ones.

[1]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[2]  S. Abramov,et al.  Pautina: the High Performance Interconnect , 2015 .

[3]  Hyesook Lim,et al.  Binary search on trie levels with a bloom filter for longest prefix match , 2014, 2014 IEEE 15th International Conference on High Performance Switching and Routing (HPSR).

[4]  Torsten Hoefler,et al.  Generic topology mapping strategies for large-scale parallel architectures , 2011, ICS '11.

[5]  Hyesook Lim,et al.  On Adding Bloom Filters to Longest Prefix Matching Algorithms , 2014, IEEE Transactions on Computers.

[6]  Jean-Baptiste Poline,et al.  Which fMRI clustering gives good brain parcellations? , 2014, Front. Neurosci..

[7]  Vijay S. Pande,et al.  Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU , 2009, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[8]  David A. Ham,et al.  Finite element assembly strategies on multi‐core and many‐core architectures , 2013 .

[9]  José Duato,et al.  Adaptive bubble router: a design to improve performance in torus networks , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[10]  Tero Karras,et al.  Maximizing parallelism in the construction of BVHs, octrees, and k-d trees , 2012, EGGH-HPG'12.

[11]  Manuel Prieto,et al.  Block Tridiagonal Solvers on Heterogeneous Architectures , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[12]  Fabiano Corsetti,et al.  Performance Analysis of Electronic Structure Codes on HPC Systems: A Case Study of SIESTA , 2014, PloS one.

[13]  Dafang Zhang,et al.  GAMT: A fast and scalable IP lookup engine for GPU-based software routers , 2013, Architectures for Networking and Communications Systems.

[14]  Paolo Bientinesi,et al.  The Vectorization of the Tersoff Multi-body Potential: An Exercise in Performance Portability , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Clark F. Olson,et al.  Parallel Algorithms for Hierarchical Clustering , 1995, Parallel Comput..

[16]  Manuel Prieto,et al.  Fast finite difference Poisson solvers on heterogeneous architectures , 2014, Comput. Phys. Commun..

[17]  Kari Laasonen Ab initio molecular dynamics. , 2013, Methods in molecular biology.

[18]  Kai Zheng,et al.  V6Gene: a scalable IPv6 prefix generator for route lookup algorithm benchmark , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[19]  Edie M. Rasmussen,et al.  Efficiency of Hierarchic Agglomerative Clustering using the ICL Distributed array Processor , 1989, J. Documentation.

[20]  Yangdong Deng,et al.  IP routing processing with graphic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[21]  Krzysztof Kaczmarski B + -Tree Optimized for GPGPU , 2012, OTM Conferences.