论文信息 - Ranked Tiling - 字舞流文

Ranked Tiling

Tiling is a well-known pattern mining technique. Traditionally, it discovers large areas of ones in binary databases or matrices, where an area is defined by a set of rows and a set of columns. In this paper, we introduce the novel problem of ranked tiling, which is concerned with finding interesting areas in ranked data. In this data, each transaction defines a complete ranking of the columns. Ranked data occurs naturally in applications like sports or other competitions. It is also a useful abstraction when dealing with numeric data in which the rows are incomparable. We introduce a scoring function for ranked tiling, as well as an algorithm using constraint programming and optimization principles. We empirically evaluate the approach on both synthetic and real-life datasets, and demonstrate the applicability of the framework in several case studies. One case study involves a heterogeneous dataset concerning the discovery of biomarkers for different subtypes of breast cancer patients. An analysis of the tiles by a domain expert shows that our approach can lead to the discovery of novel insights.

Luc De Raedt | Kathleen Marchal | Siegfried Nijssen | Thanh Le Van | Matthijs van Leeuwen | Ana Carolina Fierro | L. D. Raedt | K. Marchal | Siegfried Nijssen | M. Leeuwen | A. C. Fierro

[1] Steven J. M. Jones,et al. Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[2] G. Getz,et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers , 2011, Genome Biology.

[3] Roded Sharan,et al. Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[4] Szymon Jaroszewicz,et al. Mining rank-correlated sets of numerical attributes , 2006, KDD '06.

[5] Luc De Raedt,et al. Constraint programming for itemset mining , 2008, KDD.

[6] A. Nobel,et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[7] Arlindo L. Oliveira,et al. Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8] Mauro Brunato,et al. Discovering Non-redundant Overlapping Biclusters on Gene Expression Data , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9] Steven J. M. Jones,et al. Comprehensive molecular portraits of human breast tumours , 2013 .

[10] Amedeo Napoli,et al. Revisiting Numerical Pattern Mining with Formal Concept Analysis , 2011, IJCAI.

[11] Tingjian Ge,et al. Discovering and managing quantitative association rules , 2013, CIKM.

[12] Bart Goethals,et al. Tiling Databases , 2004, Discovery Science.

[13] Wojtek J. Krzanowski,et al. Improved biclustering of microarray data demonstrated through systematic performance tests , 2005, Comput. Stat. Data Anal..

[14] Peter A. Flach,et al. Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[15] Joseph T. Chang,et al. Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[16] George M. Church,et al. Biclustering of Expression Data , 2000, ISMB.

[17] Amedeo Napoli,et al. Biclustering Numerical Data in Formal Concept Analysis , 2011, ICFCA.

[18] Ulrich Bodenhofer,et al. FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[19] Tijl De Bie,et al. Maximum Entropy Models for Iteratively Identifying Subjectively Interesting Structure in Real-Valued Data , 2013, ECML/PKDD.

[20] Aleix Prat Aparicio. Comprehensive molecular portraits of human breast tumours , 2012 .

[21] Yaniv Ziv,et al. Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.