Artificial intelligence in bioinformatics

Abstract The contact map, a two-dimensional representation of three-dimensional protein structure, plays an important role in protein structure prediction because it brings crucial restraints on protein conformation exploration. In this chapter, we have summarized automatic methodology development for contact map prediction, whose models are generally categorized into three classes: correlated mutation analysis, direct-correlation analysis, and supervised learning models. The first two classes are unsupervised algorithms, and the last needs training samples extracted from experimentally solved protein structures. Protein residue contact prediction is an extremely imbalanced modeling problem in big data modeling, because the number of residue pairs increases exponentially with sequence length. It has hence triggered the recent deep learning model's successful applications in this topic. We also show in this chapter that the sequence-encoding features extracted from multiple sequence alignment are one of the keys for enhancing predictive performance. With the more accurate and faster models proposed for this challenging topic, predicted contact knowledge is expected to be capable of dramatically speeding up the protein 3-D structure prediction area by providing reliable and timely spatial restraints between residues.