THE SIMCAL FAMILY OF ALGORITHMS FOR ANALYSIS OF MICROARRAY DATA

SiMCAL 1, 2, and 3 (Simple Multilevel C lustering And Linking, versions 1, 2, and 3) are novel clustering algorithms for the analysis of microarray data. The purpose of these algorithms is to present complete feature sets not found in either Jarvis-Patrick clustering, from which the original SiMCAL concept is derived, or in other popular clustering methods such as hierarchical and k-means. Although each algorithm in the SiMCAL family has distinct features and methods, they all share the following attributes: they are simple, in that they are computationally inexpensive; they are multilevel, in that they provide a small number of clearly defined hierarchical levels of clusters; and they offer linking between clusters at the same level in each hierarchy. Presented here are the design, development, and analysis of the algorithms; their applications to two types of microarray data, one involving the phosphatidylserine receptor (PSR) and the other involving cystic fibrosis (CF); a description of the Web-based interface for visualization of results; and possible avenues for further development. Code and data are available at http://www.dvorkin.com/daniel/Simcal123.zip under an open-source license.

[1]  Stephen H. Friedberg,et al.  Linear Algebra , 2018, Computational Mathematics with SageMath.

[2]  Witold Pedrycz,et al.  Data Mining Methods for Knowledge Discovery , 1998, IEEE Trans. Neural Networks.

[3]  K. Brown,et al.  Elastase-mediated phosphatidylserine receptor cleavage impairs apoptotic cell clearance in cystic fibrosis and bronchiectasis. , 2002, The Journal of clinical investigation.

[4]  V. Fadok,et al.  Transcriptional and translational regulation of inflammatory mediator production by endogenous TGF-beta in macrophages that have ingested apoptotic cells. , 1999, Journal of immunology.

[5]  Lewis R. Lipsey,et al.  The Merck Manual of Diagnosis and Therapy , 1988, The Yale Journal of Biology and Medicine.

[6]  C. Weyand,et al.  Bi-directional modulation of T cell-dependent antibody production by prostaglandin E(2). , 2002, International immunology.

[7]  W. Greub Linear Algebra , 1981 .

[8]  A. Jazaeri,et al.  Choice of normal ovarian control influences determination of differentially expressed genes in ovarian cancer expression profiling studies. , 2003, Clinical cancer research : an official journal of the American Association for Cancer Research.

[9]  Hiroshi Toshida,et al.  Microarray analysis of the rat lacrimal gland following the loss of parasympathetic control of secretion. , 2004, Physiological genomics.

[10]  N. Ellison,et al.  The Merck Manual of Diagnosis and Therapy. 17th ed. , 1999 .

[11]  Curtis F. Gerald,et al.  APPLIED NUMERICAL ANALYSIS , 1972, The Mathematical Gazette.

[12]  Krzysztof J. Cios,et al.  SiMCAL 1 algorithm for analysis of gene expression data related to the phosphatidylserine receptor , 2005, Artif. Intell. Medicine.

[13]  Katsunori Yoshinaga,et al.  Gene expression profiling of cathepsin D, metallothioneins-1 and -2, osteopontin, and tenascin-C in a mouse spinal cord injury model by cDNA microarray analysis , 2005, Acta Neuropathologica.

[14]  Richard E. Neapolitan,et al.  Foundations of Algorithms , 1996 .

[15]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[16]  Maureen R Gwinn,et al.  The effect of oxythioquinox exposure on normal human mammary epithelial cell gene expression: A microarray analysis study , 2004, Environmental health : a global access science source.

[17]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[18]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[19]  Joanna S Morris,et al.  Involution of the mouse mammary gland is associated with an immune cascade and an acute-phase response, involving LBP, CD14 and STAT3 , 2003, Breast Cancer Research.

[20]  Satoru Miyano,et al.  The C Clustering Library , 2005 .

[21]  Wallace Wurth,et al.  Fundamentals of Biochemistry: , 1936, Nature.

[22]  H. Hydén,et al.  A receptor for phosphatidylserine-speci ® c clearance of apoptotic cells , 2000 .

[23]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[24]  Guido van Rossum,et al.  Python Programming Language , 2007, USENIX Annual Technical Conference.

[25]  V. Fadok,et al.  The phosphatidylserine receptor: a crucial molecular switch? , 2001, Nature Reviews Molecular Cell Biology.

[26]  Y-h. Taguchi,et al.  Relational patterns of gene expression via non-metric multidimensional scaling analysis , 2004, Bioinform..