Discovery of factors in matrices with grades

We present an approach to decomposition and factor analysis of matrices with ordinal data. The matrix entries are grades to which objects represented by rows satisfy attributes represented by columns, e.g. grades to which an image is red, a product has a given feature, or a person performs well in a test. We assume that the grades form a bounded scale equipped with certain aggregation operators and conforms to the structure of a complete residuated lattice. We present a greedy approximation algorithm for the problem of decomposition of such matrix in a product of two matrices with grades under the restriction that the number of factors be small. Our algorithm is based on a geometric insight provided by a theorem identifying particular rectangular-shaped submatrices as optimal factors for the decompositions. These factors correspond to formal concepts of the input data and allow an easy interpretation of the decomposition. We present illustrative examples and experimental evaluation.

[1]  Alon Orlitsky,et al.  Semi-parametric Exponential Family PCA , 2004, NIPS.

[2]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[3]  Vijayalakshmi Atluri,et al.  The role mining problem: finding a minimal descriptive set of roles , 2007, SACMAT '07.

[4]  A. Tversky,et al.  Foundations of Measurement, Vol. III: Representation, Axiomatization, and Invariance , 1990 .

[5]  S. Gottwald A Treatise on Many-Valued Logics , 2001 .

[6]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[7]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[8]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[9]  Radim Belohlávek,et al.  Optimal decompositions of matrices with entries from residuated lattices , 2012, J. Log. Comput..

[10]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[11]  Lawrence K. Saul,et al.  A Generalized Linear Model for Principal Component Analysis of Binary Data , 2003, AISTATS.

[12]  Petr Hájek,et al.  Metamathematics of Fuzzy Logic , 1998, Trends in Logic.

[13]  Jan de Leeuw,et al.  Principal Component Analysis of Binary Data. Applications to Roll-Call-Analysis , 2011 .

[14]  George J. Klir,et al.  Fuzzy sets and fuzzy logic , 1995 .

[15]  P. Cortez,et al.  A data mining approach to predict forest fires using meteorological data , 2007 .

[16]  Pauli Miettinen,et al.  The Discrete Basis Problem , 2006, IEEE Transactions on Knowledge and Data Engineering.

[17]  Giorgio Gambosi,et al.  Complexity and Approximation , 1999, Springer Berlin Heidelberg.

[18]  Hai Tao,et al.  Binary Principal Component Analysis , 2006, BMVC.

[19]  Jakob J. Verbeek,et al.  Transformation invariant component analysis for binary images , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[21]  Radim Belohlávek,et al.  Fuzzy Galois Connections , 1999, Math. Log. Q..

[22]  Aristides Gionis,et al.  What is the Dimension of Your Binary Data? , 2006, Sixth International Conference on Data Mining (ICDM'06).

[23]  Seokho Lee,et al.  Principal components analysis for binary data , 2009 .

[24]  Dana S. Nau,et al.  A mathematical analysis of human leukocyte antigen serology , 1978 .

[25]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[26]  Radim Belohlávek,et al.  Concept lattices and order in fuzzy logic , 2004, Ann. Pure Appl. Log..

[27]  Bernhard Ganter,et al.  Formal Concept Analysis, 6th International Conference, ICFCA 2008, Montreal, Canada, February 25-28, 2008, Proceedings , 2008, International Conference on Formal Concept Analysis.

[28]  Georg Struth,et al.  Residuated Lattices , 1938, Arch. Formal Proofs.

[29]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[30]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[31]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.