Multiple factor analysis: principal component analysis for multitable and multiblock data sets

Multiple factor analysis (MFA, also called multiple factorial analysis) is an extension of principal component analysis (PCA) tailored to handle multiple data tables that measure sets of variables collected on the same observations, or, alternatively, (in dual‐MFA) multiple data tables where the same variables are measured on different sets of observations. MFA proceeds in two steps: First it computes a PCA of each data table and ‘normalizes’ each data table by dividing all its elements by the first singular value obtained from its PCA. Second, all the normalized data tables are aggregated into a grand data table that is analyzed via a (non‐normalized) PCA that gives a set of factor scores for the observations and loadings for the variables. In addition, MFA provides for each data table a set of partial factor scores for the observations that reflects the specific ‘view‐point’ of this data table. Interestingly, the common factor scores could be obtained by replacing the original normalized data tables by the normalized factor scores obtained from the PCA of each of these tables. In this article, we present MFA, review recent extensions, and illustrate it with a detailed example. WIREs Comput Stat 2013, 5:149–179. doi: 10.1002/wics.1246

[1]  J. Pagès Multiple Factor Analysis by Example Using R , 2014 .

[2]  Marine Cadoret,et al.  Construction and evaluation of confidence ellipses applied at sensory data , 2013 .

[3]  Hildegarde Heymann,et al.  A summary of projective mapping observations – The effect of replicates and shape, and individual performance measurements , 2013 .

[4]  Brian C. Lovell,et al.  A Low-Complexity Algorithm for Static Background Estimation from Cluttered Image Sequences in Surveillance Contexts , 2013, EURASIP J. Image Video Process..

[5]  Per B. Brockhoff,et al.  Rapid descriptive sensory methods – Comparison of Free Multiple Sorting, Partial Napping, Napping, Flash Profiling and conventional profiling , 2012 .

[6]  Per B. Brockhoff,et al.  Confidence ellipses: A variation based on parametric bootstrapping applicable on Multiple Factor Analysis results for rapid graphical evaluation , 2012 .

[7]  Hervé Abdi,et al.  The Neural Basis of Vivid Memory Is Patterned on Perception , 2012, Journal of Cognitive Neuroscience.

[8]  Marie Chabbert,et al.  Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling , 2012, BMC Bioinformatics.

[9]  Julie Josse,et al.  Selecting the number of components in principal component analysis using cross-validation approximations , 2012, Comput. Stat. Data Anal..

[10]  Hervé Abdi,et al.  Multiple Subject Barycentric Discriminant Analysis (MUSUBADA): How to Assign Scans to Categories without Using Spatial Normalization , 2012, Comput. Math. Methods Medicine.

[11]  Hervé Abdi,et al.  STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling , 2012 .

[12]  Timoteo Carletti,et al.  The Stochastic Evolution of a Protocell: The Gillespie Algorithm in a Dynamically Varying Volume , 2011, Comput. Math. Methods Medicine.

[13]  Jean Thioulouse,et al.  Simultaneous analysis of a sequence of paired ecological tables: A comparison of several methods , 2011, 1202.5473.

[14]  Susan Holmes,et al.  THE DUALITY DIAGRAM IN DATA ANALYSIS: EXAMPLES OF MODERN APPLICATIONS. , 2011, The annals of applied statistics.

[15]  G. Damiani,et al.  Patterns of Long Term Care in 29 European countries: evidence from an exploratory study , 2011, BMC health services research.

[16]  Sébastien Lê,et al.  ANALYSIS OF MULTILINGUAL LABELED SORTING TASKS: APPLICATION TO A CROSS‐CULTURAL STUDY IN WINE INDUSTRY , 2011 .

[17]  Anthony Randal McIntosh,et al.  Partial Least Squares (PLS) methods for neuroimaging: A tutorial and review , 2011, NeuroImage.

[18]  Marie Chabbert,et al.  Multidimensional Scaling Reveals the Main Evolutionary Pathways of Class A G-Protein-Coupled Receptors , 2011, PloS one.

[19]  Gary Evans,et al.  Exploratory Multivariate Analysis by Example Using R , 2011 .

[20]  Kei Takeuchi,et al.  Projection Matrices, Generalized Inverse Matrices, and Singular Value Decomposition , 2011 .

[21]  Sébastien Lê,et al.  STATISTICAL ANALYSIS OF HIERARCHICAL SORTING DATA , 2011 .

[22]  Dominik Strohmeier,et al.  The Extended-OPQ Method for User-Centered Quality of Experience Evaluation: A Study for Mobile 3D Video Broadcasting over DVB-H , 2011, EURASIP J. Image Video Process..

[23]  E. Mitchell,et al.  Fine-Scale Horizontal and Vertical Micro-distribution Patterns of Testate Amoebae Along a Narrow Fen/Bog Gradient , 2011, Microbial Ecology.

[24]  M Daszykowski,et al.  Methods for the exploratory analysis of two-dimensional chromatographic signals. , 2011, Talanta.

[25]  B. Walczak,et al.  Relating gas chromatographic profiles to sensory measurements describing the end products of the Maillard reaction. , 2011, Talanta.

[26]  Pierre Legendre,et al.  Numerical Ecology with R , 2011 .

[27]  John C. Gower,et al.  Understanding Biplots: Gower/Understanding Biplots , 2011 .

[28]  El Mostafa Qannari,et al.  Analysis of -omics data: Graphical interpretation- and validation tools in multi-block methods , 2010 .

[29]  Nicolas Gratiot,et al.  Drivers of erosion and suspended sediment transport in three headwater catchments of the Mexican Central Highlands , 2010 .

[30]  Chloé Friguet,et al.  Consumer preferences for fresh tomato at the European scale: a common segmentation on taste and firmness. , 2010, Journal of food science.

[31]  Hervé Abdi,et al.  A Tutorial on Multiblock Discriminant Correspondence Analysis ( MUDICA ) : A New Method for Analyzing Discourse Data From Clinical Populations , 2010 .

[32]  F. Gillet,et al.  Community development along a proglacial chronosequence: are above‐ground and below‐ground community structure controlled more by biotic than abiotic factors? , 2010 .

[33]  Dominique Valentin,et al.  Cultural differences in food description and preference: Contrasting Vietnamese and French panellists on soy yogurts , 2010 .

[34]  Tormod Næs,et al.  Statistics for Sensory and Consumer Science , 2010 .

[35]  B. Lacroix,et al.  Describing the possible climate changes in France and some examples of their effects on main crops used in livestock systems , 2010 .

[36]  Tormod Næs,et al.  Statistics for Sensory and Consumer Science: Naes/Statistics for Sensory and Consumer Science , 2010 .

[37]  Young-Seung Lee,et al.  The application of check-all-that-apply (CATA) consumer profiling to preference mapping of vanilla ice cream and its comparison to classical external preference mapping , 2010 .

[38]  Gastón Ares,et al.  Comparison of two sensory profiling techniques based on consumer perception , 2010 .

[39]  Sébastien Lê,et al.  How reliable are the consumers? Comparison of sensory profiles from consumers and experts , 2010 .

[40]  Gastón Ares,et al.  Use of an open-ended question to identify drivers of liking of milk desserts. Comparison with preference mapping techniques , 2010 .

[41]  E. Mitchell,et al.  Contrasting Species—Environment Relationships in Communities of Testate Amoebae, Bryophytes and Vascular Plants Along the Fen–Bog Gradient , 2010, Microbial Ecology.

[42]  Gilbert Saporta,et al.  Principal Component Analysis: Application to Statistical Process Control , 2010 .

[43]  Sébastien Lê,et al.  DMFA: Dual Multiple Factor Analysis , 2010 .

[44]  B. L. Roux,et al.  Multiple Correspondence Analysis , 2009 .

[45]  D. Pfeiffer,et al.  Multivariate analysis of management and biosecurity practices in smallholder pig farms in Madagascar , 2009, Preventive veterinary medicine.

[46]  Manuel Cánovas,et al.  Model identification in presence of incomplete information by generalized principal component analysis: Application to the common and differential responses of Escherichia coli to multiple pulse perturbations in continuous, high‐biomass density culture , 2009, Biotechnology and bioengineering.

[47]  Sébastien Lê,et al.  A Factorial Approach for Sorting Task data (FAST) , 2009 .

[48]  Iven Van Mechelen,et al.  UvA-DARE ( Digital Academic Repository ) A structured overview of simultaneous component based data integration , 2009 .

[49]  Michael Greenacre,et al.  Biplots in Practice , 2009 .

[50]  Amaya Zárraga,et al.  Simultaneous analysis and multiple factor analysis for contingency tables: Two methods for the joint study of contingency tables , 2009, Comput. Stat. Data Anal..

[51]  Dominique Valentin,et al.  Experimental Design and Analysis for Psychology , 2009 .

[52]  Hervé Abdi,et al.  How to compute reliability estimates and display confidence and tolerance intervals for pattern classifiers using the Bootstrap and 3-way multidimensional scaling (DISTATIS) , 2009, NeuroImage.

[53]  S. Lê,et al.  BMC Genomics BioMed Central Methodology article Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach , 2008 .

[54]  Y. Takane,et al.  Regularized Multiple-Set Canonical Correlation Analysis , 2008 .

[55]  Myriam Maumy,et al.  Using Factor Analyses to Explore Data Generated by the National Grapevine Wood Diseases Survey , 2008 .

[56]  Martina Morris,et al.  A statnet Tutorial. , 2008, Journal of statistical software.

[57]  João T. Mexia,et al.  Models for a series of studies based on geometrical representation , 2008 .

[58]  Sébastien Lê,et al.  FactoMineR: An R Package for Multivariate Analysis , 2008 .

[59]  Jérôme Pagès,et al.  Multiple factor analysis and clustering of a mixture of quantitative, categorical and frequency data , 2008, Comput. Stat. Data Anal..

[60]  V. Jureša,et al.  Health-risk behaviour in Croatia. , 2008, Public health.

[61]  Richard Velleman,et al.  Testing fidelity to a new psychological intervention for family members of substance misusers during implementation in Italy , 2008 .

[62]  Michael R Chernick,et al.  Bootstrap Methods: A Guide for Practitioners and Researchers , 2007 .

[63]  Age K Smilde,et al.  Estimating confidence intervals for principal component loadings: a comparison between the bootstrap and asymptotic results. , 2007, The British journal of mathematical and statistical psychology.

[64]  Sandrine Pavoine,et al.  New analysis for consistency among markers in the study of genetic diversity: development and application to the description of bacterial diversity , 2007, BMC Evolutionary Biology.

[65]  Dominique Valentin,et al.  Analyzing assessors and products in sorting tasks: DISTATIS, theory and applications , 2007 .

[66]  M. Daufresne,et al.  Impacts of global changes and extreme hydroclimatic events on macroinvertebrate community structures in the French Rhône River , 2007, Oecologia.

[67]  François Husson,et al.  French cider characterization by sensory, technological and chemical evaluations , 2006 .

[68]  N. Salkind Encyclopedia of Measurement and Statistics , 2006 .

[69]  J. Joesch,et al.  European mothers’ time spent looking after children - differences and similarities across nine countries , 2006 .

[70]  Beatriz Goitisolo,et al.  Simultaneous Analysis: A Joint Study of Several Contingency Tables with Different Margins , 2006 .

[71]  M√≥nica B√©cue-Bertaut,et al.  Multiple Factor Analysis for Contingency Tables , 2006 .

[72]  M. Greenacre,et al.  Multiple Correspondence Analysis and Related Methods , 2006 .

[73]  Sébastien Lê,et al.  CONFIDENCE ELLIPSES APPLIED TO THE COMPARISON OF SENSORY PROFILES , 2006 .

[74]  Desire L. Massart,et al.  Chemometrical exploration of an isotopic ratio data set of acetylsalicylic acid , 2005 .

[75]  Jérôme Pagès,et al.  Collection and analysis of perceived product inter-distances using multiple factor analysis: Application to the study of 10 white wines from the Loire Valley , 2005 .

[76]  Snigdhansu Chatterjee,et al.  Procrustes Problems , 2005, Technometrics.

[77]  Desire L. Massart,et al.  Multiple factor analysis in environmental chemistry , 2005 .

[78]  Alice J. O'Toole,et al.  DISTATIS: The Analysis of Multiple Distance Matrices , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[79]  Donald A. Jackson,et al.  How many principal components? stopping rules for determining the number of non-trivial axes revisited , 2005, Comput. Stat. Data Anal..

[80]  Sébastien Lê,et al.  Confidence ellipse for the sensory profiles obtained by principal component analysis , 2005 .

[81]  Jérôme Pagès,et al.  Multiple factor analysis with confidence ellipses: a methodology to study the relationships between sensory and instrumental data , 2005 .

[82]  Akio Utsugi,et al.  Removal of artifacts and fluctuations from MEG data by clustering methods , 2004, Neurocomputing.

[83]  Jérôme Pagès,et al.  A principal axes method for comparing contingency tables: MFACT , 2004, Comput. Stat. Data Anal..

[84]  Ignacio García Lautre,et al.  A methodology for measuring latent variables based on multiple factor analysis , 2004, Comput. Stat. Data Anal..

[85]  Jérôme Pagès,et al.  Hierarchical Multiple Factor Analysis: application to the comparison of sensory profiles , 2003 .

[86]  Age K. Smilde,et al.  An introduction to multi-block component analysis by means of a flavor language case study , 2003 .

[87]  S. de Jong,et al.  A framework for sequential multiblock component methods , 2003 .

[88]  É. Nézan,et al.  Variability patterns of microphytoplankton communities along the French coasts , 2002 .

[89]  H. Gauvrit,et al.  Multiple factor analysis as a tool for studying the effect of physical training on the autonomic nervous system , 2002, Computers in Cardiology.

[90]  Jean-Pierre Leaute,et al.  Interaction between marine populations and fishing activities: temporal patterns of landings of La Rochelle trawlers in the Bay of Biscay , 2002 .

[91]  Heungsun Hwang,et al.  An Improved Method for Generalized Constrained Canonical Correlation Analysis , 2002, Comput. Stat. Data Anal..

[92]  Michel Tenenhaus,et al.  Multiple factor analysis combined with PLS path modelling. Application to the analysis of relationships between physicochemical variables, sensory profiles and hedonic judgements , 2001 .

[93]  Jérôme Pagès,et al.  Inter-laboratory comparison of sensory profiles: methodology and results , 2001 .

[94]  J. Gaertner,et al.  Composition of biofouling communities on suspended oyster cultures: an in situ study of their interactions with the water column , 2001 .

[95]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[96]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[97]  Daniel Chessel,et al.  Stability of spatial structures of demersal assemblages: a multitable approach , 1998 .

[98]  Joe Whittaker,et al.  Application of the Parametric Bootstrap to Models that Incorporate a Singular Value Decomposition , 1995 .

[99]  R. Tibshirani,et al.  An Introduction to the Bootstrap , 1995 .

[100]  Jérôme Pagès,et al.  Multiple factor analysis (AFMULT package) , 1994 .

[101]  Jean-Christophe Turlot,et al.  A comparative review of methods which handle a set of indexed data tables , 1989 .

[102]  O. M. Kvalheim Interpretation of direct latent-variable projection methods and their aims and use in the analysis of multicomponent spectroscopic and chromatographic data , 1988 .

[103]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[104]  P. Robert,et al.  A Unifying Tool for Linear Multivariate Statistical Methods: The RV‐Coefficient , 1976 .

[105]  Y. Escoufier LE TRAITEMENT DES VARIABLES VECTORIELLES , 1973 .

[106]  J. Gower Adding a point to vector diagrams in multivariate analysis , 1968 .

[107]  P. Horst Generalized canonical correlations and their applications to experimental data. , 1961, Journal of clinical psychology.

[108]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[109]  J. Tukey,et al.  Multiple-Factor Analysis , 1947 .

[110]  A. Householder,et al.  Discussion of a set of points in terms of their mutual distances , 1938 .

[111]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[112]  Hervé Abdi,et al.  Correspondence Analysis , 2014, Encyclopedia of Social Network Analysis and Mining.

[113]  Mónica Bécue-Bertaut,et al.  Multiple Factor Analysis for Contingency Tables in the FactoMineR Package , 2013, R J..

[114]  Hervé Abdi,et al.  Analysis of regional cerebral blood flow data to discriminate among Alzheimer's disease, frontotemporal dementia, and elderly controls: a multi-block barycentric discriminant analysis (MUBADA) methodology. , 2012, Journal of Alzheimer's disease : JAD.

[115]  F. Ribeyre,et al.  Sensory texture of cooked rice is rather linked to chemical than to physical characteristics of raw grain , 2011 .

[116]  Mostafa El Qannari,et al.  General overview of methods of analysis of multi-group datasets , 2011, HDSDA.

[117]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[118]  Juan I. Modroño-Herrán,et al.  Analysis of a Mixture of Closed and Open-Ended Questions in the Case of a Multilingual Survey , 2010 .

[119]  J. Crossa,et al.  Hierarchical Multiple-Factor Analysis for Classifying Genotypes Based on Phenotypic and Genetic Data , 2010 .

[120]  H. Abdi Partial least squares regression and projection on latent structure regression (PLS Regression) , 2010 .

[121]  T. Gómez Alvarado,et al.  Sensory characterization of "cuajada style" fresh cheese in three locations in Oaxaca, Mexico: differences in sensory perception. , 2010 .

[122]  Bülent Yener,et al.  Unsupervised Multiway Data Analysis: A Literature Survey , 2009, IEEE Transactions on Knowledge and Data Engineering.

[123]  Hervé Abdi,et al.  Barycentric Discriminant Analysis (BADIA) , 2009 .

[124]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[125]  B. Escofier,et al.  Analyses factorielles simples et multiples : objectifs, méthodes et interprétation , 2008 .

[126]  Jérôme Pagès,et al.  Rating of products through scores and free-text assertions: Comparing and combining both , 2008 .

[127]  Dominique Valentin,et al.  SOME NEW AND EASY WAYS TO DESCRIBE, COMPARE, AND EVALUATE PRODUCTS AND ASSESSORS , 2007 .

[128]  L. Lebart,et al.  Which Bootstrap for Principal Axes Methods , 2007 .

[129]  H. Abdi The RV coefficient and the congruence coefficient , 2007 .

[130]  Stéphane Dray,et al.  The ade4 Package-II: Two-table and K-table Methods , 2007 .

[131]  H. Abdi The Bonferonni and Šidák Corrections for Multiple Comparisons , 2006 .

[132]  H. Abdi,et al.  Multiple Correspondence Analysis , 2006 .

[133]  H. Abdi Discriminant Correspondence Analysis , 2006 .

[134]  Hervé Abdi,et al.  Singular Value Decomposition ( SVD ) and Generalized Singular Value Decomposition ( GSVD ) , 2006 .

[135]  Jérôme Pagès,et al.  INDSCAL model: geometrical interpretation and methodology , 2006 .

[136]  J. Pagès,et al.  Procrustes multiple factor analysis to analyse the overall perception of food products , 2006 .

[137]  Ludovic Lebart,et al.  Statistique exploratoire multidimensionnelle : visualisations et inférences en fouille de données , 2006 .

[138]  Yves Escoufier,et al.  Operator related to a data matrix: a survey , 2006 .

[139]  J. Pagès,et al.  Multiple factor analysis for contingency tables , 2006 .

[140]  J. Pagès Analyse factorielle de données mixtes , 2004 .

[141]  J. Pagès,et al.  Analyse factorielle multiple hiérarchique , 2003 .

[142]  Y. Takane Relationships among Various Kinds of Eigenvalue and Singular Value Decompositions , 2003 .

[143]  Jacqueline J. Meulman,et al.  New Developments in Psychometrics. , 2003 .

[144]  El Mostafa Qannari,et al.  Comparing Generalized Procrustes Analysis and STATIS , 1998 .

[145]  Jean Thioulouse,et al.  ADE-4: a multivariate analysis and graphical display software , 1997, Stat. Comput..

[146]  J. Pagès,et al.  Eléments de comparaison entre l'Analyse Factorielle Multiple et la méthode STATIS , 1996 .

[147]  El Mostafa Qannari,et al.  A hierarchy of models for analysing sensory data , 1995 .

[148]  M. Greenacre Correspondence analysis in practice , 1993 .

[149]  A. Morineau,et al.  Multivariate descriptive statistical analysis , 1984 .

[150]  Brigitte Escofier,et al.  Méthode pour l'analyse de plusieurs groupes de variables. Application à la caractérisation de vins rouges du Val de Loire , 1983 .

[151]  Domenges,et al.  Analyse factorielle sphérique: Une exploration , 1979 .

[152]  Brigitte Escofier,et al.  Analyse factorielle et distances répondant au principe d'équivalence distributionnelle , 1978 .

[153]  F. Cailliez,et al.  Introduction à l'analyse des données , 1976 .

[154]  R. C. Durfee,et al.  MULTIPLE FACTOR ANALYSIS. , 1967 .

[155]  Paul Horst,et al.  Factor analysis of data matrices , 1965 .

[156]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.