Responsive image
博碩士論文 etd-0104119-143816 詳細資訊
Title page for etd-0104119-143816
論文名稱
Title
非重疊反轉距離的計算方法
An Efficient Algorithm for the Distance Computation of Non-overlapping Inversions
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
52
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2019-01-29
繳交日期
Date of Submission
2019-02-04
關鍵字
Keywords
非重疊、編輯距離、基因重組、共同前綴後綴、反轉
non-overlapping, edit distance, common proper prefix suffix, inversion, genome rearrangement
統計
Statistics
本論文已被瀏覽 5635 次,被下載 31
The thesis/dissertation has been browsed 5635 times, has been downloaded 31 times.
中文摘要
給定兩個序列A和B,長度分別為m和n,編輯距離可視為使用最少花費的運算使得A轉變為B。在傳統的編輯距離問題中,允許使用的運算為插入、刪除與取代,在2004年由Gao等人提出一個包含非重疊反轉的編輯距離演算法,時間複雜度為O(m^2 n^2),其中反轉運算能與傳統的運算重疊,但不能與另一個反轉運算重疊。
此論文中,我們定義了非重疊反轉距離(EDI)問題:計算兩個序列的最短編輯距離,其中允許使用的運算為插入、刪除、取代與反轉,且彼此皆不能重疊。針對這個問題,我們提出了一個有效的計算方法,最差情況的時間複雜度為O(m^2 n),平均情況的時間複雜度為O(mn)。此外,我們也提出了增進演算法效率的改進方向。
Abstract
Given two sequences with lengths m and n, the edit distance is the minimum cost
required to transform one into the other. The operations involved in the traditional
edit distance problem include insertion, deletion and replacement. In 2004, Gao et
al. proposed an algorithm for calculating the edit distance with non-overlapping
inversion, where an inversion may overlap the classic operations but not another
inversion. The time complexity of their algorithm is O(m^2n^2). In this thesis, we
introduce a new variant of the traditional edit distance problem, the edit distance
with non-overlapping inversion (EDI) problem, which is to determine the minimum
edit distance with four operations insertion, deletion, replacement and inversion.
The used operations cannot overlap one another. We propose an efficient algorithm
for the EDI problem. The time complexity is O(m^2n) in the worst case and O(mn)
in the average case. We also present some directions for the possible improvement
of our algorithm.
目次 Table of Contents
THESIS VERIFICATION FORM . . . . . . . . . . . . . . . . . . . . . . i
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
CHINESE ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
LIST OF SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Sequence Alignment and Edit Distance . . . . . . . . . . . . . . . . . 4
2.2 Inversion and Transposition . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 The Distance with Non-overlapping Inversion . . . . . . . . . . . . . 6
2.4 The Distance with Non-overlapping Inversion and Transposition . . . 8
2.4.1 The Algorithm of Ta et al. . . . . . . . . . . . . . . . . . . . 8
2.4.2 Hsu's Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 3. Our Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 A Brute-force Algorithm for the EDI Problem . . . . . . . . . . . . . 16
3.2 Improvement of the Inversion Length Table . . . . . . . . . . . . . . 19
3.3 Analysis of the Number of Inversions . . . . . . . . . . . . . . . . . . 24
3.4 Directions for Possible Improvement . . . . . . . . . . . . . . . . . . 27
3.4.1 Leftmost or Rightmost . . . . . . . . . . . . . . . . . . . . . . 27
3.4.2 Inversion Clustering . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.3 Inversion Combination . . . . . . . . . . . . . . . . . . . . . . 32
Chapter 4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
參考文獻 References
[1] V. Bafna and P. A. Pevzner, ``Genome rearrangements and sorting by reversals,''
SIAM Journal on Computing, Vol. 25, No. 2, pp. 272--289, 1996.
[2] J. Berstel, A. Lauve, C. Reutenauer, and F. V. Saliola, Combinatorics on Words:
Christoffel Words and Repetitions in Words. Providence, Rhode Island, USA:
American Mathematical Society, 2009.
[3] K. Borozdin, D. Kosolobov, M. Rubinchik, and A. M. Shur, ``Palindromic Length
in Linear Time,'' 28th Annual Symposium on Combinatorial Pattern Match-
ing (CPM 2017), Vol. 78 of Leibniz International Proceedings in Informatics
(LIPIcs), pp. 23:1--23:12, 2017.
[4] D. Cantone, S. Faro, and E. Giaquinta, ``Approximate string matching allowing
for inversions and translocations,'' Proceedings of the Prague Stringology
Conference, Prague, Czech Republic, pp. 37--51, 2010.
[5] A. Caprara, ``Sorting by reversals is difficult,'' Proceedings of the First Annual
International Conference on Computational Molecular Biology, Santa Fe, New
Mexico, USA, pp. 75--83, ACM, 1997.
[6] D.-J. Cho, Y.-S. Han, and H. Kim, ``Alignment with non-overlapping inversions
on two strings,'' Proceedings of the 8th International Workshop on Algorithms
and Computation, Chennai, India, pp. 261--272, 2014.
[7] G. Fici, T. Gagie, J. Krkkinen, and D. Kempa, ``A subquadratic algorithm for
minimum palindromic factorization,'' Journal of Discrete Algorithms, Vol. 28,
pp. 41--48, 2014.
[8] Y. Gao, Z.-Z. Chen, G. Lin, R. Niewiadomski, Y. Wang, and J. Wu, ``A
space-efficient algorithm for sequence alignment with inversions and reversals,''
Theoretical Computer Science, Vol. 325, No. 3, pp. 361--372, 2004.
[9] S. Grabowski, S. Faro, and E. Giaquinta, ``String matching with inversions and
translocation in linear average time (most of the time),'' Information Processing
Letters, Vol. 111, pp. 516--520, 2011.
[10] T.-C. Hsu, ``An algorithm for computing the distance of the non-overlapping
inversion and transposition,'' Master thesis, Department of Computer Science
and Engineering, National Sun Yat-sen University, Taiwan, 2017.
[11] Y.-L. Huang and C. L. Lu, ``Sorting by reversal, generalized transpositions, and
translocations using permutation groups,'' Journal of Computational Biology,
Vol. 17, No. 5, pp. 685--705, 2010.
[12] J. Kececioglu and D. Sankoff, ``Exact and approximation algorithms for the
inversion distance between two chromosomes,'' Proceedings of the Annual
Symposium on Combinatorial Pattern Matching, pp. 87--105, Springer, 1993.
[13] D. E. Knuth, J. H. Morris, Jr., and V. R. Pratt, ``Fast pattern matching in
strings,'' SIAM Journal on Computing, Vol. 6, No. 2, pp. 323--350, 1977.
[14] R. Kolpakov and G. Kucherov, ``Finding maximal repetitions in a word in linear
time,'' Proceedings of the 40th Annual Symposium on Foundations of Computer
Science, New York, New York, USA, pp. 596--604, IEEE Computer Society,
1999.
[15] V. Levenshtein, ``Binary codes capable of correcting deletions, insertions and
reversal,'' Soviet Physics Doklady, Vol. 10, No. 8, pp. 707--710, 1966.
[16] M. Lothaire, Combinatorics on Words. Cambridge, England: Cambridge
University Press, 1997.
[17] S. B. Needleman and C. D. Wunsch, ``A general method applicable to the
search for similarities in the amino acid sequence of two proteins,'' Journal of
Molecular Biology, Vol. 48, No. 3, pp. 443--453, 1970.
[18] M. Schoniger and M. S. Waterman, ``A local algorithm for DNA sequence
alignment with inversions,'' Bulletin of Mathematical Biology, Vol. 54, pp. 521--
536, 1992.
[19] D. Shapira and J. A. Storer, ``Edit distance with move operations,'' Proceedings
of the Annual Symposium on Combinatorial Pattern Matching, Fukuoka, Japan,
pp. 85--98, Springer, 2002.
[20] T. T. Ta, C.-Y. Lin, and C. L. Lu, ``An efficient algorithm for computing
non-overlapping inversion and transposition distance,'' Information Processing
Letters, Vol. 116, pp. 744--749, 2016.
[21] A. F. Vellozo, C. E. Alves, and A. P. do Lago, ``Alignment with non-overlapping
inversions and translocations on two strings,'' Theoretical Computer Science,
Vol. 575, pp. 90--101, 2015.
[22] M. E. M. Walter, Z. Dias, and J. Meidanis, ``A new approach for approximating
the transposition distance,'' Proceedings of the International Symposium on
String Processing and Information Retrieval, La Coruna,Spain, pp. 27--29, 2000.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code