Identification of transcription factor binding sites based on the Chi-Square (x2) distance of a probabilistic vector model

This paper describes a new approach for locating signals, such as promoter sequences, in nucleic acid sequences. Transcription Factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position weight matrix (PWM) [1], which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. In this paper, we present a Chi-Square ( x2 ) distance model [2], which is based on the distance between the profiles of component vectors. It is a novel probabilistic method for modeling TF-DNA interactions. Our approach uses x2 distances to represent TF binding specificities. Simulation results show that the proposed approach identifies TF binding sites significantly better than the PWM model method.