Protein domain boundary prediction is usually an early step to understand protein function and structure. Most of the current computational domain boundary prediction methods suffer from low accuracy and limitation in handling multi-domain types, or even cannot be applied on certain targets such as proteins with discontinuous domain. We developed an ab-initio protein domain predictor using a stacked bidirectional LSTM model in deep learning. Our model is trained by a large amount of protein sequences without using feature engineering such as sequence profiles. Hence, the predictions using our method is much faster than others, and the trained model can be applied to any type of target proteins without constraint. We evaluated DeepDom by a 10-fold cross validation and also by applying it on targets in different categories from CASP 8 and CASP 9. The comparison with other methods has shown that DeepDom outperforms most of the current ab-initio methods and even achieves better results than the top-level template-based method in certain cases. The code of DeepDom and the test data we used in CASP 8, 9 can be accessed through GitHub at https://github.com/yuexujiang/DeepDom.
[1]
Loukianos Spyrou,et al.
Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on
,
2017,
ICASSP 2017.
[2]
Shuliang Wang,et al.
Data Mining and Knowledge Discovery
,
2005,
Mathematical Principles of the Internet.
[3]
T. Pollard,et al.
Annual review of biophysics and biomolecular structure
,
1992
.