MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets

Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology (‘MCAM’) employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems.

[1]  J. Darnell STATs and gene regulation. , 1997, Science.

[2]  Jonathan A. Cooper,et al.  Phosphorylation sites in enolase and lactate dehydrogenase utilized by tyrosine protein kinases in vivo and in vitro. , 1984, The Journal of biological chemistry.

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  Kristen M. Naegle,et al.  PTMScout, a Web Resource for Analysis of High Throughput Post-translational Proteomics Studies* , 2010, Molecular & Cellular Proteomics.

[6]  Marcus Buschbeck,et al.  Negative Regulation of HER2 Signaling by the PEST-type Protein-tyrosine Phosphatase BDP1* , 2004, Journal of Biological Chemistry.

[7]  Y. Yarden,et al.  Untangling the ErbB signalling network , 2001, Nature Reviews Molecular Cell Biology.

[8]  D. A. Hanson,et al.  Focal adhesion kinase: in command and control of cell motility , 2005, Nature Reviews Molecular Cell Biology.

[9]  Albert B. Reynolds,et al.  A core function for p120-catenin in cadherin turnover , 2003, The Journal of cell biology.

[10]  Kjetil Taskén,et al.  Analysing phosphorylation-based signalling networks by phospho flow cytometry. , 2011, Cellular signalling.

[11]  Kodi S Ravichandran,et al.  Signaling via Shc family adapter proteins , 2001, Oncogene.

[12]  Allegra Via,et al.  Phospho.ELM: a database of phosphorylation sites—update 2008 , 2007, Nucleic Acids Res..

[13]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[15]  Kristen M. Naegle,et al.  An integrated comparative phosphoproteomic and bioinformatic approach reveals a novel class of MPM-2 motifs upregulated in EGFRvIII-expressing glioblastoma cells. , 2008, Molecular bioSystems.

[16]  John McCallum,et al.  Text mining of DNA sequence homology searches. , 2003, Applied bioinformatics.

[17]  M. Mann,et al.  Global, In Vivo, and Site-Specific Phosphorylation Dynamics in Signaling Networks , 2006, Cell.

[18]  R. Passantino,et al.  ENO1 gene product binds to the c‐myc promoter and acts as a transcriptional repressor: relationship with Myc promoter‐binding protein 1 (MBP‐1) , 2000, FEBS letters.

[19]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[20]  D. Lauffenburger,et al.  Time-resolved Mass Spectrometry of Tyrosine Phosphorylation Sites in the Epidermal Growth Factor Receptor Signaling Network Reveals Dynamic Modules*S , 2005, Molecular & Cellular Proteomics.

[21]  J. Massagué TGF-beta signal transduction. , 1998, Annual review of biochemistry.

[22]  John Condeelis,et al.  The cofilin activity cycle in lamellipodia and invadopodia , 2009, Journal of cellular biochemistry.

[23]  Susan S. Taylor,et al.  Regulation of protein kinases; controlling activity through activation segment conformation. , 2004, Molecular cell.

[24]  Roberto Buccione,et al.  Novel invadopodia components revealed by differential proteomic analysis. , 2011, European journal of cell biology.

[25]  Asma Nusrat,et al.  Annexin 2 regulates intestinal epithelial cell spreading and wound closure through Rho-related signaling. , 2007, The American journal of pathology.

[26]  Matthew J. Hayes,et al.  Annexin 2 Has a Dual Role as Regulator and Effector of v-Src in Cell Transformation*S⃞ , 2009, Journal of Biological Chemistry.

[27]  Satoshi Inoue,et al.  Tyrosine phosphorylation of paxillin affects the metastatic potential of human osteosarcoma , 2005, Oncogene.

[28]  G. Cagney,et al.  Large-scale functional analysis using peptide or protein arrays , 2000, Nature Biotechnology.

[29]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Peng Gao,et al.  Application of fuzzy c-means clustering in data analysis of metabolomics. , 2009, Analytical chemistry.

[31]  Alejandro Wolf-Yadlin,et al.  Quantitative proteomic analysis of phosphotyrosine-mediated cellular signaling networks. , 2007, Methods in molecular biology.

[32]  M. Mann,et al.  Decoding signalling networks by mass spectrometry-based proteomics , 2010, Nature Reviews Molecular Cell Biology.

[33]  Raffaele Giancarlo,et al.  Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer , 2008, BMC Bioinformatics.

[34]  J. Schlessinger,et al.  Hierarchy of binding sites for Grb2 and Shc on the epidermal growth factor receptor , 1994, Molecular and cellular biology.

[35]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[36]  D. Lauffenburger,et al.  Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks , 2007, Proceedings of the National Academy of Sciences.

[37]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[38]  Oded Maimon,et al.  Evaluation of gene-expression clustering via mutual information distance measure , 2007, BMC Bioinformatics.

[39]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[40]  Julio Saez-Rodriguez,et al.  Flexible informatics for linking experimental data to mathematical models via DataRail , 2008, Bioinform..

[41]  D. Thomas,et al.  An invasion-related complex of cortactin, paxillin and PKCμ associates with invadopodia at sites of extracellular matrix degradation , 1999, Oncogene.

[42]  Alexei Grichine,et al.  Paxillin phosphorylation controls invadopodia/podosomes spatiotemporal organization. , 2007, Molecular biology of the cell.

[43]  H. Konishi,et al.  GAREM, a Novel Adaptor Protein for Growth Factor Receptor-bound Protein 2, Contributes to Cellular Transformation through the Activation of Extracellular Signal-regulated Kinase Signaling* , 2009, The Journal of Biological Chemistry.

[44]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[45]  Michael B. Yaffe,et al.  Signal transduction: Grabbing phosphoproteins , 1999, Nature.

[46]  Jing Chen,et al.  Tyrosine Phosphorylation Inhibits PKM2 to Promote the Warburg Effect and Tumor Growth , 2009, Science Signaling.