Normalized maximum likelihood models for genomics

We present NML models for discrete models and show how to apply the minimum description principle to them to obtain structure information. Then we summarize methods derived in our previous works, and we treat in a unified manner all the usual discrete models. In the last part we describe important applications of the proposed models to disease classification.

[1]  Ioan Tabus,et al.  An efficient normalized maximum likelihood algorithm for DNA sequence compression , 2005, TOIS.

[2]  Michael L. Bittner,et al.  Strong Feature Sets from Small Samples , 2002, J. Comput. Biol..

[3]  David R. Cox The analysis of binary data , 1970 .

[4]  Jaakko Astola,et al.  On the Use of MDL Principle in Gene Expression Prediction , 2001, EURASIP J. Adv. Signal Process..

[5]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[6]  Jaakko Astola,et al.  Gene feature selection , 2004 .

[7]  E. Dougherty,et al.  Multivariate measurement of gene expression relationships. , 2000, Genomics.

[8]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[9]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[10]  Ivo Grosse,et al.  Repeats and correlations in human DNA sequences. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Cesare Furlanello,et al.  Entropy-based gene ranking without selection bias for the predictive classification of microarray data , 2003, BMC Bioinformatics.

[12]  Ioan Tabus,et al.  DNA sequence compression using the normalized maximum likelihood model for discrete regression , 2003, Data Compression Conference, 2003. Proceedings. DCC 2003.

[13]  Nitin R. Patel,et al.  Computing Distributions for Exact Logistic Regression , 1987 .

[14]  Jorma Rissanen,et al.  Strong optimality of the normalized ML models as universal codes and information in data , 2001, IEEE Trans. Inf. Theory.

[15]  J. Rissanen,et al.  Normalized Maximum Likelihood Models for Boolean Regression with Application to Prediction and Classification in Genomics , 2003 .

[16]  Bin Yu,et al.  Simultaneous Gene Clustering and Subset Selection for Sample Classification Via MDL , 2003, Bioinform..

[17]  Jaakko Astola,et al.  Classification and feature gene selection using the normalized maximum likelihood model for discrete regression , 2003, Signal Process..

[18]  Xiaobo Zhou,et al.  Construction of genomic networks using mutual-information clustering and reversible-jump Markov-chain-Monte-Carlo predictor design , 2003, Signal Process..