ENGLISH ABSTRACT: The analysis of multidimensional (multivariate) data sets is a very important area of
research in applied statistics. Over the decades many techniques have been developed to
deal with such datasets. The multivariate techniques that have been developed include
inferential analysis, regression analysis, discriminant analysis, cluster analysis and many
more exploratory methods. Most of these methods deal with cases where the data contain
numerical variables. However, there are powerful methods in the literature that also deal
with multidimensional binary and count data.
The primary purpose of this thesis is to discuss the exploratory and inferential techniques
that can be used for binary and count data. In Chapter 2 of this thesis we give the detail of
correspondence analysis and canonical correspondence analysis. These methods are used
to analyze the data in contingency tables. Chapter 3 is devoted to cluster analysis. In this
chapter we explain four well-known clustering methods and we also discuss the distance
(dissimilarity) measures available in the literature for binary and count data. Chapter 4
contains an explanation of metric and non-metric multidimensional scaling. These
methods can be used to represent binary or count data in a lower dimensional Euclidean
space. In Chapter 5 we give a method for inferential analysis called the analysis of
distance. This method use a similar reasoning as the analysis of variance, but the
inference is based on a pseudo F-statistic with the p-value obtained using permutations of
the data. Chapter 6 contains real-world applications of these above methods on two
special data sets called the Biolog data and Barents Fish data.
The secondary purpose of the thesis is to demonstrate how the above techniques can be
performed in the software package R. Several R packages and functions are discussed
throughout this thesis. The usage of these functions is also demonstrated with appropriate
examples. Attention is also given to the interpretation of the output and graphics. The
thesis ends with some general conclusions and ideas for further research.%%%%AFRIKAANSE OPSOMMING: Die analise van meerdimensionele (meerveranderlike) datastelle is ’n belangrike area van
navorsing in toegepaste statistiek. Oor die afgelope dekades is daar verskeie tegnieke
ontwikkel om sulke data te ontleed. Die meerveranderlike tegnieke wat ontwikkel is sluit
in inferensie analise, regressie analise, diskriminant analise, tros analise en vele meer
verkennende data analise tegnieke. Die meerderheid van hierdie metodes hanteer gevalle
waar die data numeriese veranderlikes bevat. Daar bestaan ook kragtige metodes in die
literatuur vir die analise van meerdimensionele binere en telling data.
Die primere doel van hierdie tesis is om tegnieke vir verkennende en inferensiele analise
van binere en telling data te bespreek. In Hoofstuk 2 van hierdie tesis bespreek ons
ooreenkoms analise en kanoniese ooreenkoms analise. Hierdie metodes word gebruik om
data in…
[1]
Michael Greenacre,et al.
Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package
,
2007
.
[2]
R N Shepard,et al.
Multidimensional Scaling, Tree-Fitting, and Clustering
,
1980,
Science.
[3]
J. Kruskal.
Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis
,
1964
.
[4]
C. Braak.
Canonical Correspondence Analysis: A New Eigenvector Technique for Multivariate Direct Gradient Analysis
,
1986
.
[5]
G. Quinn,et al.
Experimental Design and Data Analysis for Biologists
,
2002
.
[6]
S. S. Wilks.
CERTAIN GENERALIZATIONS IN THE ANALYSIS OF VARIANCE
,
1932
.