論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
非重疊反轉距離的計算方法 An Efficient Algorithm for the Distance Computation of Non-overlapping Inversions |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
52 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2019-01-29 |
繳交日期 Date of Submission |
2019-02-04 |
關鍵字 Keywords |
非重疊、編輯距離、基因重組、共同前綴後綴、反轉 non-overlapping, edit distance, common proper prefix suffix, inversion, genome rearrangement |
||
統計 Statistics |
本論文已被瀏覽 5635 次,被下載 31 次 The thesis/dissertation has been browsed 5635 times, has been downloaded 31 times. |
中文摘要 |
給定兩個序列A和B,長度分別為m和n,編輯距離可視為使用最少花費的運算使得A轉變為B。在傳統的編輯距離問題中,允許使用的運算為插入、刪除與取代,在2004年由Gao等人提出一個包含非重疊反轉的編輯距離演算法,時間複雜度為O(m^2 n^2),其中反轉運算能與傳統的運算重疊,但不能與另一個反轉運算重疊。 此論文中,我們定義了非重疊反轉距離(EDI)問題:計算兩個序列的最短編輯距離,其中允許使用的運算為插入、刪除、取代與反轉,且彼此皆不能重疊。針對這個問題,我們提出了一個有效的計算方法,最差情況的時間複雜度為O(m^2 n),平均情況的時間複雜度為O(mn)。此外,我們也提出了增進演算法效率的改進方向。 |
Abstract |
Given two sequences with lengths m and n, the edit distance is the minimum cost required to transform one into the other. The operations involved in the traditional edit distance problem include insertion, deletion and replacement. In 2004, Gao et al. proposed an algorithm for calculating the edit distance with non-overlapping inversion, where an inversion may overlap the classic operations but not another inversion. The time complexity of their algorithm is O(m^2n^2). In this thesis, we introduce a new variant of the traditional edit distance problem, the edit distance with non-overlapping inversion (EDI) problem, which is to determine the minimum edit distance with four operations insertion, deletion, replacement and inversion. The used operations cannot overlap one another. We propose an efficient algorithm for the EDI problem. The time complexity is O(m^2n) in the worst case and O(mn) in the average case. We also present some directions for the possible improvement of our algorithm. |
目次 Table of Contents |
THESIS VERIFICATION FORM . . . . . . . . . . . . . . . . . . . . . . i ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . iii CHINESE ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x LIST OF SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Sequence Alignment and Edit Distance . . . . . . . . . . . . . . . . . 4 2.2 Inversion and Transposition . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 The Distance with Non-overlapping Inversion . . . . . . . . . . . . . 6 2.4 The Distance with Non-overlapping Inversion and Transposition . . . 8 2.4.1 The Algorithm of Ta et al. . . . . . . . . . . . . . . . . . . . 8 2.4.2 Hsu's Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 3. Our Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1 A Brute-force Algorithm for the EDI Problem . . . . . . . . . . . . . 16 3.2 Improvement of the Inversion Length Table . . . . . . . . . . . . . . 19 3.3 Analysis of the Number of Inversions . . . . . . . . . . . . . . . . . . 24 3.4 Directions for Possible Improvement . . . . . . . . . . . . . . . . . . 27 3.4.1 Leftmost or Rightmost . . . . . . . . . . . . . . . . . . . . . . 27 3.4.2 Inversion Clustering . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.3 Inversion Combination . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 |
參考文獻 References |
[1] V. Bafna and P. A. Pevzner, ``Genome rearrangements and sorting by reversals,'' SIAM Journal on Computing, Vol. 25, No. 2, pp. 272--289, 1996. [2] J. Berstel, A. Lauve, C. Reutenauer, and F. V. Saliola, Combinatorics on Words: Christoffel Words and Repetitions in Words. Providence, Rhode Island, USA: American Mathematical Society, 2009. [3] K. Borozdin, D. Kosolobov, M. Rubinchik, and A. M. Shur, ``Palindromic Length in Linear Time,'' 28th Annual Symposium on Combinatorial Pattern Match- ing (CPM 2017), Vol. 78 of Leibniz International Proceedings in Informatics (LIPIcs), pp. 23:1--23:12, 2017. [4] D. Cantone, S. Faro, and E. Giaquinta, ``Approximate string matching allowing for inversions and translocations,'' Proceedings of the Prague Stringology Conference, Prague, Czech Republic, pp. 37--51, 2010. [5] A. Caprara, ``Sorting by reversals is difficult,'' Proceedings of the First Annual International Conference on Computational Molecular Biology, Santa Fe, New Mexico, USA, pp. 75--83, ACM, 1997. [6] D.-J. Cho, Y.-S. Han, and H. Kim, ``Alignment with non-overlapping inversions on two strings,'' Proceedings of the 8th International Workshop on Algorithms and Computation, Chennai, India, pp. 261--272, 2014. [7] G. Fici, T. Gagie, J. Krkkinen, and D. Kempa, ``A subquadratic algorithm for minimum palindromic factorization,'' Journal of Discrete Algorithms, Vol. 28, pp. 41--48, 2014. [8] Y. Gao, Z.-Z. Chen, G. Lin, R. Niewiadomski, Y. Wang, and J. Wu, ``A space-efficient algorithm for sequence alignment with inversions and reversals,'' Theoretical Computer Science, Vol. 325, No. 3, pp. 361--372, 2004. [9] S. Grabowski, S. Faro, and E. Giaquinta, ``String matching with inversions and translocation in linear average time (most of the time),'' Information Processing Letters, Vol. 111, pp. 516--520, 2011. [10] T.-C. Hsu, ``An algorithm for computing the distance of the non-overlapping inversion and transposition,'' Master thesis, Department of Computer Science and Engineering, National Sun Yat-sen University, Taiwan, 2017. [11] Y.-L. Huang and C. L. Lu, ``Sorting by reversal, generalized transpositions, and translocations using permutation groups,'' Journal of Computational Biology, Vol. 17, No. 5, pp. 685--705, 2010. [12] J. Kececioglu and D. Sankoff, ``Exact and approximation algorithms for the inversion distance between two chromosomes,'' Proceedings of the Annual Symposium on Combinatorial Pattern Matching, pp. 87--105, Springer, 1993. [13] D. E. Knuth, J. H. Morris, Jr., and V. R. Pratt, ``Fast pattern matching in strings,'' SIAM Journal on Computing, Vol. 6, No. 2, pp. 323--350, 1977. [14] R. Kolpakov and G. Kucherov, ``Finding maximal repetitions in a word in linear time,'' Proceedings of the 40th Annual Symposium on Foundations of Computer Science, New York, New York, USA, pp. 596--604, IEEE Computer Society, 1999. [15] V. Levenshtein, ``Binary codes capable of correcting deletions, insertions and reversal,'' Soviet Physics Doklady, Vol. 10, No. 8, pp. 707--710, 1966. [16] M. Lothaire, Combinatorics on Words. Cambridge, England: Cambridge University Press, 1997. [17] S. B. Needleman and C. D. Wunsch, ``A general method applicable to the search for similarities in the amino acid sequence of two proteins,'' Journal of Molecular Biology, Vol. 48, No. 3, pp. 443--453, 1970. [18] M. Schoniger and M. S. Waterman, ``A local algorithm for DNA sequence alignment with inversions,'' Bulletin of Mathematical Biology, Vol. 54, pp. 521-- 536, 1992. [19] D. Shapira and J. A. Storer, ``Edit distance with move operations,'' Proceedings of the Annual Symposium on Combinatorial Pattern Matching, Fukuoka, Japan, pp. 85--98, Springer, 2002. [20] T. T. Ta, C.-Y. Lin, and C. L. Lu, ``An efficient algorithm for computing non-overlapping inversion and transposition distance,'' Information Processing Letters, Vol. 116, pp. 744--749, 2016. [21] A. F. Vellozo, C. E. Alves, and A. P. do Lago, ``Alignment with non-overlapping inversions and translocations on two strings,'' Theoretical Computer Science, Vol. 575, pp. 90--101, 2015. [22] M. E. M. Walter, Z. Dias, and J. Meidanis, ``A new approach for approximating the transposition distance,'' Proceedings of the International Symposium on String Processing and Information Retrieval, La Coruna,Spain, pp. 27--29, 2000. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:自定論文開放時間 user define 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |