The R package ESTHER implements two novel algorithms designed to dispose in a reduced space multidimen- sional objects defined by binary descriptors. The approach is to assign discrete and fix positions to all possible combina- tions obtained with the descriptors employed. One of the two algorithms, called clock, position objects on a circle at regu- lar intervals, whereas the other, star, maintains the angular position as in clock, but defines the distance of the object from the center of the circle proportionally to the number of descriptor in state"1". Comparisons with Principal Coordinate Analysis (PCoA) showed that the three methods perform differently according to the number of objects and descriptors and to the distance method employed to carry out the PCoA. The algorithm clock produced the best object clustering in a validation carried out with a matrix generated by molecular fingerprint of yeast isolates. An additional problem, which is at the origin of the algo- rithm proposed in this paper, is the impossibility to show in the PCoA graph the proportion of the descriptors in use. In fact, n binary variables can describe a maximum of 2n possi- ble objects, although the number of biological objects stud- ied is normally well below 2n. In normal conditions, all ob- jects (excluding the repetitions) are represented by one of the 2n combinations obtained with the n descriptors in use, meaning that only a few combinations are actually present in the matrix. The relationships among these few combinations and their positioning within the complex of all possible combinations are further aims of the present algorithms, named ESTHER after the ancient Persian word, meaning star, for some similarity between the point scattering and the stylized depiction of the star. METHODS General Presentation of the Algorithms Included in ESTHER ESTHER includes seven functions, two of them (clock and star) are designed to position in a binary space object described by several descriptors, both of them include an option to evaluate the quality of the point scattering in the bidimensional graph. The other four functions are auxiliary; one, import, is designed to directly import binary matrices, three to produce binary matrices: fullmat generates matrices with all 2n combinations, partmat yields only a defined por- tion of all combinations and finally ranmat generates random combinations with the desired number of objects and de- scriptors. The function shepard returns a double plot with the Shepard diagrams of ESTHER and PCoA.
[1]
P. Legendre,et al.
Developments in Numerical Ecology
,
1988
.
[2]
Tao Jiang,et al.
Clustering Binary Fingerprint Vectors with Missing Values for DNA Array Data Analysis
,
2004,
J. Comput. Biol..
[3]
Gianluigi Cardinali,et al.
Electrophoretic data classification for phylogenetics and biostatistics
,
2003,
Bioinform..
[4]
N. Khim,et al.
Large-scale malaria survey in Cambodia: Novel insights on species distribution and risk factors
,
2007,
Malaria Journal.
[5]
Richard W. Hamming,et al.
Error detecting and error correcting codes
,
1950
.
[6]
J. Gower,et al.
Metric and Euclidean properties of dissimilarity coefficients
,
1986
.
[7]
J. Louw,et al.
A 10‐year survey of large bowel carcinoma at Groote Schuur Hospital with particular reference to patients under 30 years of age
,
1979,
The British journal of surgery.
[8]
J. Stanley,et al.
Large-scale survey of Campylobacter species in human gastroenteritis by PCR and PCR-enzyme-linked immunosorbent assay.
,
1999,
Journal of clinical microbiology.