Abstract |
Various approaches have been proposed for finding the longest common subsequence (LCS) of two sequences. The time complexities of these algorithms are usually $O(n^2)$ in the worst case, where $n$ is the length of input sequences. However, these algorithms would become infeasible when the input length, $n$, is very long. Recently, the $k$-LCS $(k ≥ 2)$ problem has become more attractive. Some algorithms have been proposed for solving the problem, but the execution time required for solving the $k$-LCS problem is still too long to be practical. In this thesis, we propose a genetic algorithm for solving the $k$-LCS problem with time complexity $O(Gpk(n + |P_j|))$, which $G$ is the number of generations, $p$ is the number of template patterns, $k$ is the number of input sequences, $n$ and $|P_j|$ are the length of input sequences and the length of template patterns, respectively. As our experimental results show, when $k$ is 20 and $n$ is 1000, the performance ratio ($|CS|/|LCS|$) of our algorithm is greater than 0.8, where $|CS|$ denotes the length of the solution we find, and $|LCS|$ represents the length of the real (optimal) LCS. Comparing the performance ratios with Expansion Algorithm and BNMAS Algorithm, our algorithm is much better than them when the number of input sequences varies from 2 to 20 and the length of the input sequences varies from 100 to 2000. |