Handling fuzzy gaps in sequential patterns: Application to health

Dealing with digital data for mining novel knowledge is a non trivial task that has received much attention in the last years. However, it is still not easy to handle such data, especially when large volumes of values must be analyzed. In our work, we focus on biological data from DNA chips that biologists study in order to try and discover new gene correlations that could help understanding diseases like breast cancer. In this framework, we consider the values from the DNA microarrays, which convey the behavior of some genes, and we want to discover how these behaviors are correlated. This data are digital values that can be ordered and sorted. In previous work, sequential patterns like {(1 5)(2)} have been discovered, meaning that genes 1 and 5 have the same expression level followed by gene 2 that has a higher expression value. However, such data are very noisy and considering close values as ordered is often false. We thus consider here fuzzy rankings based on a fuzzy partition provided by the experts. Rules can then better characterize how genes are correlated.

[1]  Le Gruenwald,et al.  Microarray gene expression data association rules mining based on BSC-tree and FIS-tree , 2005, Data Knowl. Eng..

[2]  Jürgen Götz,et al.  Functional Genomics meets neurodegenerative disorders Part II: Application and data integration , 2005, Progress in Neurobiology.

[3]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Ruggero G. Pensa,et al.  Constrained Co-clustering of Gene Expression Data , 2008, SDM.

[6]  Céline Fiot,et al.  Softening the blow of frequent sequence analysis: soft constraints and temporal accuracy , 2009, Int. J. Web Eng. Technol..

[7]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[8]  H. Zimmermann,et al.  Fuzzy sets theory and applications , 1986 .

[9]  A. Stromberg,et al.  Harnessing the power of gene microarrays for the study of brain aging and Alzheimer's disease: Statistical reliability and functional correlation , 2005, Ageing Research Reviews.

[10]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[11]  Maguelonne Teisseire,et al.  Mining Discriminant Sequential Patterns for Aging Brain , 2009, AIME.

[12]  Céline Fiot,et al.  Extended Time Constraints for Sequence Mining , 2007, 14th International Symposium on Temporal Representation and Reasoning (TIME'07).

[13]  John D. Storey A direct approach to false discovery rates , 2002 .

[14]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[15]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[16]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.