Text Mining Perspectives in Microarray Data Mining

Current microarray data mining methods such as clustering, classification, and association analysis heavily rely on statistical and machine learning algorithms for analysis of large sets of gene expression data. In recent years, there has been a growing interest in methods that attempt to discover patterns based on multiple but related data sources. Gene expression data and the corresponding literature data are one such example. This paper suggests a new approach to microarray data mining as a combination of text mining (TM) and information extraction (IE). TM is concerned with identifying patterns in natural language text and IE is concerned with locating specific entities, relations, and facts in text. The present paper surveys the state of the art of data mining methods for microarray data analysis. We show the limitations of current microarray data mining methods and outline how text mining could address these limitations.

[1]  P. Febbo,et al.  Literature Lab: a method of automated literature interrogation to infer biology from microarray analysis , 2007, BMC Genomics.

[2]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[4]  Jill P. Mesirov,et al.  Class prediction and discovery using gene expression data , 2000, RECOMB '00.

[5]  Francisco Azuaje,et al.  Clustering-based Approaches to Discovering and Visualising Microarray Data Patterns , 2003, Briefings Bioinform..

[6]  Chiara Sabatti,et al.  Statistical Issues in Microarray Analysis , 2002 .

[7]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[8]  L. Dublin Vital Statistics. , 1961, British medical journal.

[9]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[10]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[12]  Russ B Altman,et al.  Extracting and characterizing gene-drug relationships from the literature. , 2004, Pharmacogenetics.

[13]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[14]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[15]  Erica A Golemis,et al.  From correlation to causality: microarrays, cancer, and cancer treatment. , 2003, BioTechniques.

[16]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Werner Dubitzky,et al.  Multiclass Cancer Classification Using Gene Expression Profiling and Probabilistic Neural Networks , 2002, Pacific Symposium on Biocomputing.

[18]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[19]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[20]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[22]  Peter Jackson,et al.  Natural language processing for online applications : text retrieval, extraction and categorization , 2002 .

[23]  K. Kinzler,et al.  The multistep nature of cancer. , 1993, Trends in genetics : TIG.

[24]  Jeffrey T. Chang,et al.  Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. , 2002, Genome research.

[25]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[26]  A. Schuster,et al.  Tumor classification by gene expression profiling: comparison and validation of five clustering methods , 2001, SIGB.

[27]  J. Natarajan,et al.  Knowledge Discovery in Biology and Biotechnology Texts: A Review of Techniques, Evaluation Strategies, and Applications , 2005, Critical reviews in biotechnology.

[28]  Jeyakumar Natarajan,et al.  Text mining of full-text journal articles combined with gene expression analysis reveals a relationship between sphingosine-1-phosphate and invasiveness of a glioblastoma cell line , 2006, BMC Bioinformatics.

[29]  Claire Tilstone DNA microarrays: Vital statistics , 2003, Nature.

[30]  A. Sinha,et al.  Gene expression profile analysis by DNA microarrays: promise and pitfalls. , 2001, JAMA.

[31]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.