Noise and non-linearities in high-throughput data

High-throughput data analyses are becoming common in biology, communications, economics and sociology. The vast amounts of data are usually represented in the form of matrices and can be considered as knowledge networks. Spectra-based approaches have proved useful in extracting hidden information within such networks and for estimating missing data, but these methods are based essentially on linear assumptions. The physical models of matching, when applicable, often suggest non-linear mechanisms, that may sometimes be identified as noise. The use of non-linear models in data analysis, however, may require the introduction of many parameters, which lowers the statistical weight of the model. According to the quality of data, a simpler linear analysis may be more convenient than more complex approaches. In this paper, we show how a simple non-parametric Bayesian model may be used to explore the role of non-linearities and noise in synthetic and experimental data sets.

[1]  R. Page,et al.  Phylogenetic Noise Leads to Unbalanced Cladistic Tree Reconstructions , 1995 .

[2]  Tom Minka,et al.  Automatic Choice of Dimensionality for PCA , 2000, NIPS.

[3]  Walter R. Gilks,et al.  Modeling the percolation of annotation errors in a database of protein sequences , 2002, Bioinform..

[4]  Richard M. Everson,et al.  Inferring the eigenvalues of covariance matrices from limited, noisy data , 2000, IEEE Trans. Signal Process..

[5]  Pietro Liò,et al.  Inference on Missing Values in Genetic Networks Using High-Throughput Data , 2008, EvoBIO.

[6]  J. J. Rajan,et al.  Model Order Selection For The Singular Value Decomposition And The Discrete Karhunen-Loeve Transform Using A Bayesian Approach , 1997 .

[7]  Franco Bagnoli,et al.  Biologically Inspired Classifier , 2007, BIOWIRE.

[8]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[9]  Janet M Thornton,et al.  Ligand selectivity and competition between enzymes in silico , 2004, Nature Biotechnology.

[10]  Z. Szallasi,et al.  Reliability and reproducibility issues in DNA microarray measurements. , 2006, Trends in genetics : TIG.

[11]  C. Ouzounis,et al.  Percolation of annotation errors through hierarchically structured protein sequence databases. , 2005, Mathematical biosciences.

[12]  Franco Bagnoli,et al.  De gustibus disputandum (forecasting opinions by knowledge networks) , 2004 .

[13]  S Maslov,et al.  Extracting hidden information from knowledge networks. , 2001, Physical review letters.

[14]  T. Hughes,et al.  Exploration of Essential Gene Functions via Titratable Promoter Alleles , 2004, Cell.

[15]  A. Moya,et al.  Phylogenetic signal and functional categories in Proteobacteria genomes , 2007, BMC Evolutionary Biology.

[16]  Pietro Liò,et al.  Bayesian Inference on Hidden Knowledge in High-Throughput Molecular Biology Data , 2008, PRICAI.

[17]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..