Ranked Tiling

Tiling is a well-known pattern mining technique. Traditionally, it discovers large areas of ones in binary databases or matrices, where an area is defined by a set of rows and a set of columns. In this paper, we introduce the novel problem of ranked tiling, which is concerned with finding interesting areas in ranked data. In this data, each transaction defines a complete ranking of the columns. Ranked data occurs naturally in applications like sports or other competitions. It is also a useful abstraction when dealing with numeric data in which the rows are incomparable. We introduce a scoring function for ranked tiling, as well as an algorithm using constraint programming and optimization principles. We empirically evaluate the approach on both synthetic and real-life datasets, and demonstrate the applicability of the framework in several case studies. One case study involves a heterogeneous dataset concerning the discovery of biomarkers for different subtypes of breast cancer patients. An analysis of the tiles by a domain expert shows that our approach can lead to the discovery of novel insights.

[1]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[2]  G. Getz,et al.  GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers , 2011, Genome Biology.

[3]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[4]  Szymon Jaroszewicz,et al.  Mining rank-correlated sets of numerical attributes , 2006, KDD '06.

[5]  Luc De Raedt,et al.  Constraint programming for itemset mining , 2008, KDD.

[6]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[7]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Mauro Brunato,et al.  Discovering Non-redundant Overlapping Biclusters on Gene Expression Data , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[10]  Amedeo Napoli,et al.  Revisiting Numerical Pattern Mining with Formal Concept Analysis , 2011, IJCAI.

[11]  Tingjian Ge,et al.  Discovering and managing quantitative association rules , 2013, CIKM.

[12]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[13]  Wojtek J. Krzanowski,et al.  Improved biclustering of microarray data demonstrated through systematic performance tests , 2005, Comput. Stat. Data Anal..

[14]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[15]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[16]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[17]  Amedeo Napoli,et al.  Biclustering Numerical Data in Formal Concept Analysis , 2011, ICFCA.

[18]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[19]  Tijl De Bie,et al.  Maximum Entropy Models for Iteratively Identifying Subjectively Interesting Structure in Real-Valued Data , 2013, ECML/PKDD.

[20]  Aleix Prat Aparicio Comprehensive molecular portraits of human breast tumours , 2012 .

[21]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.