Abstract | In biological sequence research, the positional weight matrix (PWM) is often used for motif signal detection. A set of experimentally verified oligonucleotides known to be functional subsequences, which can be bound by a transcription factor (TF), as translational initiation sites or pre-mRNA splicing sites, are collected and aligned. The frequency of each nucleotide A, C, G, or T at each column of the alignment is calculated in the matrix. Once a PWM is constructed, it can be used to search from a nucleotide sequence for the subsequences that possibly perform the same function. The match between a subsequence and a PWM is usually described by a score function, which measures the closeness of the subsequence to the PWM as compared with the given background. However, selection of threshold scores that legitimately qualify a functional subsequence has been a great challenge. Many laboratories have attempted tackling this problem; but there is no significant breakthrough so far. In this chapter, we discuss the characteristics of a PWM and factors that affect motif predictions and propose a new score function that is tied into information content and statistical expectation of a PWM. We also apply this score function in the PWMs from public databases and compare it favorably with the broadly used Match method. |
---|