Motif Recognition Using LVQ Classifiers with Overlap-based Similarity Metrics
暂无分享,去创建一个
Identifying locations and specificities of
DNA-protein binding sites (also termed as motifs) is an
important step towards understanding the mechanism of
gene expressions. To save experimental cost and time,
computational approaches have received increasing
interest and demonstrated good potential for problem
solving. Given a set of known motif instances associated
with a transcription factor, motif recognition turns to be
a biological data classification problem where the
datasets demonstrate a remarkable imbalance property.
This paper deals with a problem of single motif
recognition using machine learning techniques. We first
develop an overlap-based similarity metrics (OSIM) to
compare DNA sub-sequences. As an application of the
metrics to motif recognition, we then propose a motif
recognition system that makes use of Learning Vector
Quantization 1 (LVQ1) as a primary classifier. In the
system, we replace the Euclidian norm of LVQ1 by OSIM
and introduce corresponding modifications to the
winning prototype update and classification process. The
system is also integrated with a new sampling technique
to handle the imbalance property of biological datasets.
Finally, we examine the recognition capability of our
motif recognition approach in comparison with P-Match
and three well-known learner models, namely Neural
Networks (NN), Support Vector Machine (SVM), and
Learning Vector Quantization 1 (LVQ1). Experimental
results show that with the support of OSIM and the
sampling method, the learner models can produce high
recall rates but quite low precision rates for the tested
datasets.