Improving protein complex classification accuracy using amino acid composition profile

Protein complex prediction approaches are based on the assumptions that complexes have dense protein-protein interactions and high functional similarity between their subunits. We investigated those assumptions by studying the subunits' interaction topology, sequence similarity and molecular function for human and yeast protein complexes. Inclusion of amino acids' physicochemical properties can provide better understanding of protein complex properties. Principal component analysis is carried out to determine the major features. Adopting amino acid composition profile information with the SVM classifier serves as an effective post-processing step for complexes classification. Improvement is based on primary sequence information only, which is easy to obtain.

[1]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[2]  Kuo-Bin Li,et al.  Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties. , 2008, Journal of theoretical biology.

[3]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[4]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[6]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[7]  Teresa M. Przytycka,et al.  Decomposition of Overlapping Protein Complexes: A Graph Theoretical Method for Analyzing Static and Dynamic Protein Associations , 2005, Systems Biology and Regulatory Genomics.

[8]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[9]  Kahn Rhrissorrakrai,et al.  MINE: Module Identification in Networks , 2011, BMC Bioinformatics.

[10]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[11]  Marcel J. T. Reinders,et al.  Protein Complex Prediction Using an Integrative Bioinformatics Approach , 2007, J. Bioinform. Comput. Biol..

[12]  Peng Jiang,et al.  SPICi: a fast clustering algorithm for large biological networks , 2010, Bioinform..

[13]  Robert X. Gao,et al.  PCA-based feature selection scheme for machine defect classification , 2004, IEEE Transactions on Instrumentation and Measurement.

[14]  Peter F. Stadler,et al.  New journal: Algorithms for Molecular Biology , 2006, Algorithms for Molecular Biology.

[15]  Emil Alexov,et al.  Electrostatic properties of protein-protein complexes. , 2006, Biophysical journal.

[16]  Shigeo Abe,et al.  Support Vector Machines for Pattern Classification (Advances in Pattern Recognition) , 2005 .

[17]  Dmitrij Frishman,et al.  An evolutionary and structural characterization of mammalian protein complex organization , 2008, BMC Genomics.

[18]  Colin Kleanthous,et al.  Protein-protein recognition , 2000 .

[19]  W. Krzanowski Selection of Variables to Preserve Multivariate Data Structure, Using Principal Components , 1987 .

[20]  Shoshana J. Wodak,et al.  Markov clustering versus affinity propagation for the partitioning of protein interaction graphs , 2009, BMC Bioinformatics.

[21]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.

[22]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[23]  Dmitrij Frishman,et al.  The MIPS mammalian protein?Cprotein interaction database , 2005, Bioinform..

[24]  Ivan Molineris,et al.  A new computational approach to analyze human protein complexes and predict novel protein interactions , 2007, Genome Biology.

[25]  Ka-Lok Ng,et al.  Protein complexes subunits interaction topology and sequence identity , 2011, 2011 3rd International Conference on Computer Research and Development.

[26]  Frederic Pio,et al.  Predicting protein complexes by data integration of different types of interactions , 2010, Int. J. Comput. Biol. Drug Des..

[27]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[28]  Caroline C. Friedel,et al.  ProCope - protein complex prediction and evaluation , 2008, Bioinform..

[29]  J. G. Isebrands,et al.  Introduction to Uses and Interpretation of Principal Component Analysis in Forest Biology , 1975 .

[30]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[31]  William Stafford Noble,et al.  Predicting Co-Complexed Protein Pairs from Heterogeneous Data , 2008, PLoS Comput. Biol..

[32]  Yunlong Liu,et al.  2K09 and thereafter : the coming era of integrative bioinformatics, systems biology and intelligent computing for functional genomics and personalized medicine research , 2010, BMC Genomics.

[33]  Jinrang Kim,et al.  Are acidic and basic groups in buried proteins predicted to be ionized? , 2005, Journal of molecular biology.

[34]  Sarah A Teichmann,et al.  The origins and evolution of functional modules: lessons from protein complexes , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[35]  S. Dongen Graph clustering by flow simulation , 2000 .

[36]  Emil Alexov,et al.  Nucleic Acids Research Advance Access published October 28, 2006 PROTCOM: searchable database of protein complexes enhanced with domain–domain structures , 2006 .

[37]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[38]  Yu Wang,et al.  An edge based core-attachment method to detect protein complexes in PPI networks , 2011, 2011 IEEE International Conference on Systems Biology (ISB).

[39]  Imran Sarwar Bajwa,et al.  Feature Based Image Classification by using Principal Component Analysis , 2009 .

[40]  Hanah Margalit,et al.  Characterization and prediction of protein–protein interactions within and between complexes , 2006, Proceedings of the National Academy of Sciences.

[41]  Chun-Nan Hsu,et al.  Identification of homologous microRNAs in 56 animal genomes. , 2010, Genomics.

[42]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[43]  Ying Dai,et al.  Principal component analysis based methods in bioinformatics studies , 2011, Briefings Bioinform..

[44]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[45]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[46]  Jennifer M. Rust,et al.  The BioGRID Interaction Database , 2011 .

[47]  Ian T. Jolliffe,et al.  Discarding Variables in a Principal Component Analysis. I: Artificial Data , 1972 .

[48]  Shigeo Abe Support Vector Machines for Pattern Classification , 2010, Advances in Pattern Recognition.

[49]  Ujjwal Maulik,et al.  Fuzzy clustering of physicochemical and biochemical properties of amino Acids , 2011, Amino Acids.

[50]  Manuel Zúñiga,et al.  Characterization of a novel Lactobacillus species closely related to Lactobacillus johnsonii using a combination of molecular and comparative genomics methods , 2010, BMC Genomics.

[51]  Jun Guo,et al.  Prediction of amyloid fibril-forming segments based on a support vector machine , 2009, BMC Bioinformatics.

[52]  Nagiza F. Samatova,et al.  From pull-down data to protein interaction networks and complexes with biological relevance. , 2008, Bioinformatics.

[53]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[54]  R D Appel,et al.  Protein identification and analysis tools in the ExPASy server. , 1999, Methods in molecular biology.

[55]  Yanjun Qi,et al.  Protein complex identification by supervised graph local clustering , 2008, ISMB.

[56]  Hon Wai Leong,et al.  MCL-CAw: a refinement of MCL for detecting yeast complexes from weighted PPI networks by incorporating core-attachment structure , 2010, BMC Bioinformatics.

[57]  Doheon Lee,et al.  Specificity of molecular interactions in transient protein–protein interaction interfaces , 2006, Proteins.

[58]  Robert Gentleman,et al.  Local modeling of global interactome networks , 2005 .

[59]  Ian M. Donaldson,et al.  The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[60]  Mehdi Pirooznia,et al.  Transcriptomic analysis of RDX and TNT interactive sublethal effects in the earthworm Eisenia fetida , 2008, BMC Genomics.

[61]  Siu-Ming Yiu,et al.  Predicting Protein Complexes from PPI Data: A Core-Attachment Approach , 2009, J. Comput. Biol..

[62]  Reshma Khemchandani,et al.  Twin Support Vector Machines for Pattern Classification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[64]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[65]  Omid Khayat,et al.  Image classification using principal feature analysis , 2008 .

[66]  Peter Dayan,et al.  Serotonin, Inhibition, and Negative Mood , 2007, PLoS Comput. Biol..

[67]  David L Robertson,et al.  Effect of dataset selection on the topological interpretation of protein interaction networks , 2005, BMC Genomics.

[68]  Mark Gerstein,et al.  Predicting interactions in protein networks by completing defective cliques , 2006, Bioinform..

[69]  S. vanDongen Graph Clustering by Flow Simulation , 2000 .

[70]  Huiru Zheng,et al.  Integration of Genomic Data for Inferring Protein Complexes from Global Protein–Protein Interaction Networks , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[71]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[72]  Edward Keedwell,et al.  Intelligent Bioinformatics: The Application of Artificial Intelligence Techniques to Bioinformatics Problems , 2005 .

[73]  James Robert Krycer,et al.  Are protein complexes made of cores, modules and attachments? , 2008, Proteomics.

[74]  David L. Robertson,et al.  Protein Interactions from Complexes: A Structural Perspective , 2006, Comparative and functional genomics.

[75]  Zehra Cataltepe,et al.  A PCA/ICA based feature selection method and its application for corn fungi detection , 2007, 2007 15th European Signal Processing Conference.

[76]  Zelmina Lubovac,et al.  Combining functional and topological properties to identify core modules in protein interaction networks , 2006, Proteins.