Factor Analysis of Incidence Data via Novel Decomposition of Matrices

Matrix decomposition methods provide representations of an object-variable data matrix by a product of two different matrices, one describing relationship between objects and hidden variables or factors, and the other describing relationship between the factors and the original variables. We present a novel approach to decomposition and factor analysis of matrices with incidence data. The matrix entries are grades to which objects represented by rows satisfy attributes represented by columns, e.g. grades to which an image is red or a person performs well in a test. We assume that the grades belong to a scale bounded by 0 and 1 which is equipped with certain aggregation operators and forms a complete residuated lattice. We present an approximation algorithm for the problem of decomposition of such matrices with grades into products of two matrices with grades with the number of factors as small as possible. Decomposition of binary matrices into Boolean products of binary matrices is a special case of this problem in which 0 and 1 are the only grades. Our algorithm is based on a geometric insight provided by a theorem identifying particular rectangular-shaped submatrices as optimal factors for the decompositions. These factors correspond to formal concepts of the input data and allow for an easy interpretation of the decomposition. We present the problem formulation, basic geometric insight, algorithm, illustrative example, experimental evaluation.

[1]  W. P. Dixon,et al.  BMPD statistical software manual , 1988 .

[2]  R. P. Dilworth,et al.  Residuated Lattices. , 1938, Proceedings of the National Academy of Sciences of the United States of America.

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  Petr Hájek,et al.  Metamathematics of Fuzzy Logic , 1998, Trends in Logic.

[5]  Václav Snásel,et al.  Binary Factor Analysis with Genetic Algorithms , 2005, WSTST.

[6]  Radim Belohlávek,et al.  Concept lattices and order in fuzzy logic , 2004, Ann. Pure Appl. Log..

[7]  Jan de Leeuw Principal Component Analysis of Binary Data. Applications to Roll-Call-Analysis - eScholarship , 2003 .

[8]  Jaap Van Brakel,et al.  Foundations of measurement , 1983 .

[9]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[10]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[11]  Radim Belohlávek,et al.  Optimal decompositions of matrices with grades into binary and graded matrices , 2010, Annals of Mathematics and Artificial Intelligence.

[12]  R. P. Dilworth Non-Commutative Residuated Lattices , 1939 .

[13]  Patrick Suppes,et al.  Foundations of measurement , 1971 .

[14]  Ronald Fagin,et al.  Combining fuzzy information: an overview , 2002, SGMD.

[15]  Jan de Leeuw,et al.  Principal Component Analysis of Binary Data. Applications to Roll-Call-Analysis , 2011 .

[16]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[17]  Johannes Fürnkranz,et al.  Knowledge Discovery in Databases: PKDD 2006, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 18-22, 2006, Proceedings , 2006, PKDD.

[18]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[19]  Lawrence K. Saul,et al.  A Generalized Linear Model for Principal Component Analysis of Binary Data , 2003, AISTATS.

[20]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[21]  Radko Mesiar,et al.  Triangular Norms , 2000, Trends in Logic.

[22]  Bernhard Ganter,et al.  Formal Concept Analysis, 6th International Conference, ICFCA 2008, Montreal, Canada, February 25-28, 2008, Proceedings , 2008, International Conference on Formal Concept Analysis.

[23]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[24]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2006, IEEE Transactions on Knowledge and Data Engineering.

[25]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[26]  Vijayalakshmi Atluri,et al.  The role mining problem: finding a minimal descriptive set of roles , 2007, SACMAT '07.

[27]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[28]  Gene H. Golub,et al.  Matrix computations , 1983 .

[29]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[30]  Ronald Fagin,et al.  Combining fuzzy information from multiple systems (extended abstract) , 1996, PODS.

[31]  Alexander A. Frolov,et al.  Boolean Factor Analysis by Attractor Neural Network , 2007, IEEE Transactions on Neural Networks.

[32]  Dana S. Nau,et al.  A mathematical analysis of human leukocyte antigen serology , 1978 .

[33]  R. Belohlavek,et al.  Optimal decompositions of matrices with grades , 2008, 2008 4th International IEEE Conference Intelligent Systems.

[34]  Alon Orlitsky,et al.  Semi-parametric Exponential Family PCA , 2004, NIPS.

[35]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[36]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[37]  Václav Snásel,et al.  Binary Factor Analysis with Help of Formal Concepts , 2004, CLA.

[38]  Aristides Gionis,et al.  What is the Dimension of Your Binary Data? , 2006, Sixth International Conference on Data Mining (ICDM'06).

[39]  Giorgio Gambosi,et al.  Complexity and Approximation , 1999, Springer Berlin Heidelberg.

[40]  Hai Tao,et al.  Binary Principal Component Analysis , 2006, BMVC.

[41]  Peter C. Fishburn,et al.  Foundations of Measurement:@@@Volume II: Geometrical, Threshold, and Probabilistic Representations@@@Volume III: Representation, Axiomatization, and Invariance. , 1991 .

[42]  Jakob J. Verbeek,et al.  Transformation invariant component analysis for binary images , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[43]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .