We investigate combinatorial enumeration problems related to subsequences of strings; in contrast to substrings, subsequences need not be contiguous. For a finite alphabet Σ, the following three problems are solved. (1) Number of distinct subsequences: Given a sequence s ∈Σn and a nonnegative integer k ≤n, how many distinct subsequences of length k does s contain? A previous result by Chase states that this number is maximized by choosing s as a repeated permutation of the alphabet. This has applications in DNA microarray production. (2) Number of ρ-restricted ρ-generated sequences: Given s ∈Σn and integers k ≥1 and ρ≥1, how many distinct sequences in Σk contain no single nucleotide repeat longer than ρ and can be written as $s_1^{r_1}\dots s_n^{r_n}$ with 0≤ri ≤ρ for all i? For ρ= ∞, the question becomes how many length-k sequences match the regular expression s1*s2* ...sn*. These considerations allow a detailed analysis of a new DNA sequencing technology (“454 sequencing”). (3) Exact length distribution of the longest increasing subsequence: Given Σ= {1,...,K} and an integer n ≥1, determine the number of sequences in Σn whose longest strictly increasing subsequence has length k, where 0 ≤k ≤K. This has applications to significance computations for chaining algorithms.
[1]
Rolf Niedermeier,et al.
Invitation to Fixed-Parameter Algorithms
,
2006
.
[2]
Steven Skiena,et al.
The Algorithm Design Manual
,
2020,
Texts in Computer Science.
[3]
Sven Rahmann.
The shortest common supersequence problem in a microarray production setting
,
2003,
ECCB.
[4]
P. J. Chase.
Subsequence numbers and logarithmic concavity
,
1976,
Discret. Math..
[5]
James R. Knight,et al.
Genome sequencing in microfabricated high-density picolitre reactors
,
2005,
Nature.
[6]
P. Diaconis,et al.
Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem
,
1999
.