Concept-Based Biclustering for Internet Advertisement

The problem of detecting terms that can be interesting to the advertiser is considered. If a company has already bought some advertising terms which describe certain services, it is reasonable to find out the terms bought by competing companies. On the other hand, the company, that provides context advertisement, wants to discover prospective markets, the advertisers. It can be done by means of so called biclustering. For binary relation firms terms the most natural bicluster definition is a tuple of two subsets of firms and terms respectively, where each firm from the first component buys each term from the second one. To solve this task there is a well-developed notion of formal concept which has almost equivalent definition to such a bicluster in terms of object-attribute tables in Formal Concept Analysis. However, the number of formal concepts (biclusters) for a given dataset can be of exponential size in the worst case. To avoid this difficulty we proposed a new concept-based biclustering method. The new bicluster definition, (dense) object-attribute bicluster or simply oa-bicluster, is a relaxation of formal concept notion. Our findings shows that the number of (dense) oa-biclusters is no greater than the number of non-empty cells of initial binary relation. The paper contains experimental results on applying the proposed algorithm to contextual Internet advertisement data in comparison with some FCA algorithms and additional results on so-called morphological metarules for term recommendation task on the same data.

[1]  Boris G. Mirkin,et al.  Approximate Bicluster and Tricluster Boxes in the Analysis of Binary Data , 2011, RSFDGrC.

[2]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[3]  Camille Roth,et al.  Approaches to the Selection of Relevant Concepts in the Case of Noisy Data , 2010, ICFCA.

[4]  Amedeo Napoli,et al.  ZART: A Multifunctional Itemset Mining Algorithm , 2007, CLA.

[5]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[6]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[7]  Gerd Stumme,et al.  Computing iceberg concept lattices with T , 2002, Data Knowl. Eng..

[8]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[9]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[10]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[11]  Jean-François Boulicaut,et al.  Mining a New Fault-Tolerant Pattern Type as an Alternative to Formal Concept Discovery , 2006, ICCS.

[12]  Jean-François Boulicaut,et al.  Constraint-based concept mining and its application to microarray data analysis , 2005, Intell. Data Anal..

[13]  Vilém Vychodil,et al.  Factor Analysis of Incidence Data via Novel Decomposition of Matrices , 2009, ICFCA.

[14]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[15]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[16]  Sergei O. Kuznetsov,et al.  Comparing performance of algorithms for generating concept lattices , 2002, J. Exp. Theor. Artif. Intell..

[17]  Sergei O. Kuznetsov,et al.  On stability of a formal concept , 2007, Annals of Mathematics and Artificial Intelligence.

[18]  Amedeo Napoli,et al.  CORON: A Framework for Levelwise Itemset Mining Algorithms , 2005 .

[19]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[20]  Jean-François Boulicaut,et al.  Closed patterns meet n-ary relations , 2009, TKDD.

[21]  Boris Mirkin,et al.  Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .

[22]  David Crystal,et al.  A dictionary of linguistics and phonetics , 1997 .

[23]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.