A Comparison Study of Transcription Factor – DNA Binding Models

The comparison study is drawn between two widely used motif representations i.e. Positional Weight Matrices (PWM) and Consensus Sequences. In the case of motif finding, where the binding sites are not known a priori but the algorithm must search a large space of possible binding sites, the PWM model may be difficult to learn as the search space is very large even for the PWM of short length (RN for a PWM of length N, where R is the space of real numbers between 0 to 1). Optimization methods used to search for the best PWM may converge to a local minimum. On the other hand the consensus sequence has a smaller search space (15N for a motif of length N) which is easier to search for the global optimum.