Multiple testing in large-scale contingency tables: inferring patterns of pair-wise amino acid association in beta-sheets

This study examines the feasibility of using multiple testing procedures for an inference of independence of categories in each cell in contingency tables. In the simulation study, we compare the performance of various multiple testing procedures in a contingency table setup and demonstrate the relationship among the proportion of true null hypothesis, type I error, power, and false discovery rate. Finally, we apply the proposed methodology to identify the patterns of pair-wise associations of amino acids involved in beta-sheet bridges in proteins. We identify a number of amino acid pairs that exhibit either strong or weak association.

[1]  S. Haberman The Analysis of Residuals in Cross-Classified Tables , 1973 .

[2]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[3]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[5]  G. Heijne,et al.  The β structure: Inter-strand correlations , 1977 .

[6]  G. Heijne,et al.  Some global β‐sheet characterstics , 1978 .

[7]  J. Thornton,et al.  Determinants of strand register in antiparallel β‐sheets of proteins , 1998, Protein science : a publication of the Protein Society.

[8]  M. A. Wouters,et al.  An analysis of side chain interactions and pair correlations within antiparallel β‐sheets: The differences between backbone hydrogen‐bonded and non‐hydrogen‐bonded residue pairs , 1995, Proteins.

[9]  H. O. Lancaster,et al.  The derivation and partition of chi2 in certain discrete distributions. , 1949, Biometrika.

[10]  L. Regan,et al.  Guidelines for Protein Design: The Energetics of β Sheet Side Chain Interactions , 1995, Science.

[11]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[12]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[13]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[14]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Bradley Efron,et al.  Microarrays empirical Bayes methods, and false discovery rates , 2001 .

[16]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.

[17]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[18]  B. Alberts,et al.  An Introduction to the Molecular Biology of the Cell , 1998 .

[19]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[20]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[21]  John D. Storey A direct approach to false discovery rates , 2002 .

[22]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[23]  Joseph Berkson,et al.  Some Difficulties of Interpretation Encountered in the Application of the Chi-Square Test , 1938 .

[24]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[25]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[26]  C. Sander,et al.  Specific recognition in the tertiary structure of β-sheets of proteins , 1980 .