An efficient map-reduce algorithm for computing formal concepts from binary data

The problem of discovering all formal concepts embedded in a binary relational dataset is of significant interest for many data analysis and processing problems. The problem of enumerating all concepts for a dataset is known to be NP-hard. A number of Map-Reduce based algorithms have been developed to conquer the difficulty of processing large datasets. But these algorithms are not very scalable because at core all these algorithms stick to a DFS based sequential search for individual concepts, applying Map-Reduce formalism to parallelize the processing at nodes of the DFS tree. We have present here a completely different and Map-Reduce based formulation for parallelizing the concept discovery problem. One major difference of our approach is that we seek to find a sufficient set of concepts only, and this sufficient set can be used to generate all other concepts in the lattice. This formulation is not very suitable for sequential execution but adapts extremely well to the parallel environments based on Map-Reduce operators. We show in this paper that our algorithm is significantly faster than all the known Map-reduce formulations for discovering concepts in binary relational datasets. We have presented in this paper the outline of the theoretical foundations for our algorithm and empirical tests with a number of benchmarking datasets. We also show that the computationally very difficult problem of finding 3-clusters in pairs of binary relational datasets can also be made very efficient by our formulation.

[1]  Vilém Vychodil,et al.  Distributed Algorithm for Computing Formal Concepts Using Map-Reduce Framework , 2009, IDA.

[2]  Lars Schmidt-Thieme,et al.  Combining multi-distributed mixture models and bayesian networks for semi-supervised learning , 2007, ICMLA 2007.

[3]  Raj Bhatnagar,et al.  An effective algorithm for mining 3-clusters in vertically partitioned data , 2008, CIKM '08.

[4]  Bernhard Ganter,et al.  Two Basic Algorithms in Concept Analysis , 2010, ICFCA.

[5]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Ильина Кристина Александровна ФОНЕТИЧЕСКАЯ ИНТЕРФЕРЕНЦИЯ В НОВОСТНЫХ ПРОГРАММАХ ВСЕМИРНОЙ СЛУЖБЫ BBC , 2017 .

[7]  Jesús S. Aguilar-Ruiz,et al.  A biclustering algorithm for extracting bit-patterns from binary datasets , 2011, Bioinform..

[8]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[9]  Raj Bhatnagar,et al.  An Efficient Constraint-Based Closed Set Mining Algorithm , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[10]  Sergei O. Kuznetsov,et al.  Triadic Formal Concept Analysis and triclustering: searching for optimal patterns , 2015, Machine Learning.

[11]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  Ruairí de Fréin,et al.  Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduce Framework , 2012, ICFCA.

[13]  Vilém Vychodil,et al.  Discovery of optimal factors in binary data via a novel method of matrix decomposition , 2010, J. Comput. Syst. Sci..

[14]  Zhen Hu,et al.  Algorithm for Discovering Low-Variance 3-Clusters from Real-Valued Datasets , 2010, 2010 IEEE International Conference on Data Mining.

[15]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[16]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.