Greedy Discovery of Ordinal Factors

In large datasets, it is hard to discover and analyze structure. It is thus common to introduce tags or keywords for the items. In applications, such datasets are then filtered based on these tags. Still, even medium-sized datasets with a few tags result in complex and for humans hard-to-navigate systems. In this work, we adopt the method of ordinal factor analysis to address this problem. An ordinal factor arranges a subset of the tags in a linear order based on their underlying structure. A complete ordinal factorization, which consists of such ordinal factors, precisely represents the original dataset. Based on such an ordinal factorization, we provide a way to discover and explain relationships between different items and attributes in the dataset. However, computing even just one ordinal factor of high cardinality is computationally complex. We thus propose the greedy algorithm in this work. This algorithm extracts ordinal factors using already existing fast algorithms developed in formal concept analysis. Then, we leverage to propose a comprehensive way to discover relationships in the dataset. We furthermore introduce a distance measure based on the representation emerging from the ordinal factorization to discover similar items. To evaluate the method, we conduct a case study on different datasets.

[1]  Andreas Hotho,et al.  Proximity dimensions and the emergence of collaboration: a HypTrails study on German AI research , 2021, Scientometrics.

[2]  Aytuğ Onan,et al.  Two-Stage Topic Extraction Model for Bibliometric Data Analysis Based on Word Embeddings and Clustering , 2019, IEEE Access.

[3]  Andreas Kerren,et al.  Toward a Quantitative Survey of Dimension Reduction Techniques , 2019, IEEE Transactions on Visualization and Computer Graphics.

[4]  Kristian Kersting,et al.  Was ist eine Professur fuer Kuenstliche Intelligenz? , 2019, ArXiv.

[5]  Haroon Idrees,et al.  NMF-KNN: Image Annotation Using Weighted Multi-view Non-negative Matrix Factorization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cynthia Vera Glodeanu,et al.  Ordinal Factor Analysis of Graded Data , 2014, ICFCA.

[7]  Bernhard Ganter,et al.  Applications of Ordinal Factor Analysis , 2013, ICFCA.

[8]  Bernhard Ganter,et al.  Ordinal Factor Analysis , 2012, ICFCA.

[9]  Rui Li,et al.  Survey on social tagging techniques , 2010, SKDD.

[10]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[11]  Rudolf Wille,et al.  Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts , 2009, ICFCA.

[12]  Christian Wolff,et al.  Tree, funny, to_read, google: what are tags supposed to achieve? a comparative analysis of user keywords for different digital resource types , 2008, SSM '08.

[13]  Oded Nov,et al.  What drives content tagging: the case of photos on Flickr , 2008, CHI.

[14]  Chris H. Q. Ding,et al.  Binary Matrix Factorization with Applications , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[15]  Georges G. Grinstein,et al.  A survey of visualizations for high-dimensional data mining , 2001 .

[16]  P. Boeck,et al.  Hierarchical classes: Model and data analysis , 1988 .

[17]  M. Yannakakis Computing the Minimum Fill-in is NP^Complete , 1981 .

[18]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[19]  Petr Krajca,et al.  Improving the Performance of Lindig-Style Algorithms with Empty Intersections , 2021, ICCS.

[20]  Parul M. Jain,et al.  A survey paper on comparative study between Principal Component Analysis (PCA) and Exploratory Factor Analysis (EFA) , 2013 .

[21]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[22]  Vilém Vychodil,et al.  Formal Concepts as Optimal Factors in Boolean Factor Analysis: Implications and Experiments , 2007, CLA.

[23]  Aleš Keprt,et al.  Algorithms for binary factor analysis , 2006 .

[24]  Václav Snásel,et al.  Binary Factor Analysis with Help of Formal Concepts , 2004, CLA.

[25]  Peter Øhrstrøm,et al.  Working with Conceptual Structures - Contributions to ICCS 2000 , 2000 .

[26]  Christian Lindig Fast Concept Analysis , 2000 .

[27]  Dennis Child,et al.  The essentials of factor analysis , 1970 .