Sequences and functional information based protein-protein interaction characteristics evaluation and database construction

Objective Protein-protein interaction (PPI) studies are important for understanding the DNA function andfunctional mechanism of genomic elements.A lot of works had been done towards the direction of calculating predictions of PPIs,and these methods are a very important tool for determining the PPI chaiacteristics.Still much more works remain to be done for evaluating the PPI characteristics and building a related database.Methods we extracted 27 PPI features from gene,protein sequences and functional information in human,then apphed them to various classifiers and evaluated the performance of all the classifiers and features by ROC curve.Results Through our analysis,we found that logistic regression and bayesian network classification are best for PPI characteristics.Biological Process,Cell Composition,Molecular Function,Gene Expression Values,Organization,and Availability of interactions between domain were obviously more useful than other characteristics.Meanwhile,we built a easy using Human Protein Feature Database (HPFD).Conclusion We discovered PPI characteristics with better performance usability in evaluation of function characteristic.However,some of these characteristics,still need to be further optimized in terms of improved PPI coverage. Key words: Protein-protein interactions ;  Machine learning ;  Feature evaluations ;  Database