BRIEF REPORTS Open Access

Background: With the advent of high throughput genomics and high-resolution imaging techniques, there is a growing necessity in biology and medicine for parallel computing, and with the low cost of computing, it is now cost-effective for even small labs or individuals to build their own personal computation cluster. Methods: Here we briefly describe how to use commodity hardware to build a low-cost, high-performance compute cluster, and provide an in-depth example and sample code for parallel execution of R jobs using MOSIX, a mature extension of the Linux kernel for parallel computing. A similar process can be used with other cluster platform software. Results: As a statistical genetics example, we use our cluster to run a simulated eQTL experiment. Because eQTL is computationally intensive, and is conceptually easy to parallelize, like many statistics/genetics applications, parallel execution with MOSIX gives a linear speedup in analysis time with little additional effort. Conclusions: We have used MOSIX to run a wide variety of software programs in parallel with good results. The limitations and benefits of using MOSIX are discussed and compared to other platforms.

[1]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[2]  Jeremy C Simpson,et al.  Quantitative image analysis approaches for probing Rab GTPase localization and function in mammalian cells. , 2012, Biochemical Society transactions.

[3]  Kevin W Eliceiri,et al.  NIH Image to ImageJ: 25 years of image analysis , 2012, Nature Methods.

[4]  Alexander K. Epstein,et al.  Steering nanofibers: An integrative approach to bio-inspired fiber fabrication and assembly , 2012 .

[5]  Kathleen M. Curran,et al.  Dual channel rank-based intensity weighting for quantitative co-localization of microscopy images , 2011, BMC Bioinformatics.

[6]  Markus J Buehler,et al.  Nanomechanics of functional and pathological amyloid materials. , 2011, Nature nanotechnology.

[7]  Thomas J Hoffmann,et al.  Passing in Command Line Arguments and Parallel Cluster/Multicore Batching in R with batch. , 2011, Journal of statistical software.

[8]  J. Witte Genome-wide association studies and beyond. , 2010, Annual review of public health.

[9]  Lazaros Mavridis,et al.  Pacific Symposium on Biocomputing 15:281-292(2010) 3D-BLAST: 3D PROTEIN STRUCTURE ALIGNMENT, COMPARISON, AND CLASSIFICATION USING SPHERICAL POLAR FOURIER CORRELATIONS , 2022 .

[10]  Hao Yu,et al.  State of the Art in Parallel Computing with R , 2009 .

[11]  Ian Sillitoe,et al.  The CATH classification revisited—architectures reviewed and new ways to characterize structural divergence in superfamilies , 2008, Nucleic Acids Res..

[12]  Scott A. Rifkin,et al.  Revealing the architecture of gene regulation: the promise of eQTL studies. , 2008, Trends in genetics : TIG.

[13]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[14]  F. Cordelières,et al.  A guided tour into subcellular colocalization analysis in light microscopy , 2006, Journal of microscopy.

[15]  Tomoko Nomoto,et al.  Expression Quantitative Trait Loci Analysis of 13 Genes in the Rat Prostate , 2005, Genetics.

[16]  Thorsten Joachims,et al.  Supervised clustering with support vector machines , 2005, ICML.

[17]  P. De Los Rios,et al.  Scaling exponents and probability distributions of DNA end-to-end distance. , 2005, Physical review letters.

[18]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[19]  Chih-Ping Wei,et al.  Empirical comparison of fast clustering algorithms for large data sets , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[20]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[21]  Amnon Barak,et al.  The MOSIX multicomputer operating system for high performance cluster computing , 1998, Future Gener. Comput. Syst..

[22]  C. Bustamante,et al.  Scanning force microscopy of DNA deposited onto mica: equilibration versus kinetic trapping studied by statistical polymer chain analysis. , 1996, Journal of molecular biology.

[23]  J. Howard,et al.  Flexural rigidity of microtubules and actin filaments measured from thermal fluctuations in shape , 1993, The Journal of cell biology.

[24]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[25]  S. Edwards,et al.  The Theory of Polymer Dynamics , 1986 .

[26]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[27]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[28]  Guido Schwarzer,et al.  Easier parallel computing in R with snowfall and sfCluster , 2009, R J..

[29]  Michael D. Abràmoff,et al.  Image processing with ImageJ , 2004 .

[30]  Greg Burns,et al.  LAM: An Open Cluster Environment for MPI , 2002 .