A Characterization of the Number of Subsequences Obtained via the Deletion Channel

Motivated by the study of deletion channels, this paper presents improved bounds on the number of subsequences obtained from a binary string $X$ of length $n$ under $t$ deletions. It is known that the number of subsequences in this setting strongly depends on the number of runs in the string $X$ ; where a run is a maximal substring of the same character. Our improved bounds are obtained by a structural analysis of the family of $r$ -run strings $X$ , an analysis in which we identify the extremal strings with respect to the number of subsequences. Specifically, for every $r$ , we present $r$ -run strings with the minimum (respectively maximum) number of subsequences under any $t$ deletions; we perform an exact analysis of the number of subsequences of these extremal strings; and show that this number can be calculated in polynomial time.