Abstract |
Given three sequences A, B and C with lengths of m, n and r, respectively, the constrained longest common subsequence (CLCS) problem is to fi nd the LCS of A and B which contains C as the subsequence of the answer. The dynamic program- ming (DP) for solving CLCS, proposed by Chin et al., calculates two-dimensional lattices layer by layer. We find that the values of most corresponding CLCS lattice cells are identical in two consecutive layers when |∑| is small, where |∑| denotes the alphabet size. In this thesis, we clarify which lattice cells need to be calculated, and which lattices need not be calculated, since the calculation is redundant if two cells are equal in two consecutive layers. Accordingly, our algorithm calculates only some special boundary cells, instead of the whole 3-dimensional lattice in most cases. Our algorithm still requires O(mnr) time and O(mn) space to solve the CLCS problem in the worst case. In 2010, Deorowicz and Obstoj showed that the algorithm of Chin et al. has good performance when |∑| ≤ 20. As our experimental results show, our algorithm is faster than Chin's algorithm when |∑| ≤ 20. So our algorithm is better than most of the previous CLCS algorithms when |∑| is small. |