Machine learning in low-level microarray analysis

Machine learning and data mining have found a multitude of successful applications in microarray analysis, with gene clustering and classification of tissue samples being widely cited examples. Low-level microarray analysis -- often associated with the pre-processing stage within the microarray life-cycle -- has increasingly become an area of active research, traditionally involving techniques from classical statistics. This paper explores opportunities for the application of machine learning and data mining methods to several important low-level microarray analysis problems: monitoring gene expression, transcript discovery, genotyping and resequencing. Relevant methods and ideas from the machine learning community include semi-supervised learning, learning from heterogeneous data, and incremental learning.

[1]  Sayan Mukherjee,et al.  Molecular classification of multiple tumor types , 2001, ISMB.

[2]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[3]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[4]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[5]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[6]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[7]  Alexander Gammerman,et al.  Learning by Transduction , 1998, UAI.

[8]  Sharon L. Oviatt,et al.  Multimodal Integration - A Statistical View , 1999, IEEE Trans. Multim..

[9]  M. S. Brown,et al.  Support Vector Machine Classification of Microarray from Gene Expression Data , 1999 .

[10]  S. P. Fodor,et al.  Large-scale genotyping of complex DNA , 2003, Nature Biotechnology.

[11]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[12]  Jason Weston,et al.  Transductive Inference for Estimating Values of Functions , 1999, NIPS.

[13]  T. Speed,et al.  Statistical issues in cDNA microarray data analysis. , 2003, Methods in molecular biology.

[14]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[15]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[16]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[17]  Wei-Min Liu,et al.  Rank-based algorithms for anlaysis of microarrays , 2001, SPIE BiOS.

[18]  Haym Hirsh,et al.  Improving Short-Text Classification using Unlabeled Data for Classification Problems , 2000, ICML.

[19]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[20]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[21]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[23]  Damian Labuda,et al.  Assessing DNA sequence variations in human ESTs in a phylogenetic context using high-density oligonucleotide arrays. , 2002, Genomics.

[24]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[25]  S. P. Fodor,et al.  Large-Scale Transcriptional Activity in Chromosomes 21 and 22 , 2002, Science.

[26]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[27]  A Chakravarti,et al.  Parallel genotyping of human SNPs using generic high-density oligonucleotide tag arrays. , 2000, Genome research.

[28]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[29]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Mark Schena,et al.  DNA microarrays : a practical approach , 1999 .

[31]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[32]  Wei-Min Liu,et al.  Analysis of high density expression microarrays with signed-rank call algorithms , 2002, Bioinform..

[33]  Nir Friedman,et al.  Probabilistic models for identifying regulation networks , 2003, ECCB.

[34]  Marimuthu Palaniswami,et al.  Incremental training of support vector machines , 2005, IEEE Transactions on Neural Networks.

[35]  Jill P. Mesirov,et al.  Support Vector Machine Classification of Microarray Data , 2001 .

[36]  J. B. Hampshire,et al.  Toward efficient collaborative classification for distributed video surveillance , 2000 .

[37]  T. Gaasterland,et al.  Making the most of microarray data , 2000, Nature Genetics.

[38]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[39]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[40]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[41]  Eran Segal,et al.  Session Introduction: Joint Learning from Multiple Types of Genomic Data , 2005, Pacific Symposium on Biocomputing.

[42]  Stat Pairs,et al.  Statistical Algorithms Description Document , 2022 .

[43]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[44]  Susan R. Wilson,et al.  Some considerations for the design of microarray experiments , 2003 .

[45]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[46]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[47]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[48]  Fabio Gagliardi Cozman,et al.  Unlabeled Data Can Degrade Classification Performance of Generative Classifiers , 2002, FLAIRS.

[49]  Rafael A. Irizarry Measures of gene expression for affymetrix high density oligonucleotide arrays , 2003 .

[50]  D. R. Goldstein,et al.  Science and Statistics: A Festschrift for Terry Speed , 2003 .

[51]  S. Cawley,et al.  Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. , 2004, Genome research.

[52]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[53]  A Chakravarti,et al.  High-throughput variation detection and genotyping using microarrays. , 2001, Genome research.

[54]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[55]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[56]  Jing Huang,et al.  Algorithms for large-scale genotyping microarrays , 2003, Bioinform..

[57]  Shenghuo Zhu,et al.  Gene functional classification by semi-supervised learning from heterogeneous data , 2003, SAC '03.

[58]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.