博碩士論文 etd-0913112-052307 詳細資訊


[回到前頁查詢結果 | 重新搜尋]

姓名 黃為喆(Wei-Jhe Huang) 電子郵件信箱 freedomlife0212@gmail.com
畢業系所 資訊管理學系研究所(Information Management)
畢業學位 碩士(Master) 畢業時期 100學年第2學期
論文名稱(中) 在雲端運算環境下使用分散式演化式演算法推導大型基因調控網路
論文名稱(英) Applying MapReduce Island-based Genetic Algorithm-Particle Swarm Optimization to the inference of large Gene Regulatory Network in Cloud Computing environment
檔案
  • etd-0913112-052307.pdf
  • 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
    請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
    論文使用權限

    電子論文:使用者自訂權限:校內 5 年後、校外 5 年後公開

    論文語文/頁數 中文/74
    統計 本論文已被瀏覽 5381 次,被下載 580 次
    摘要(中) 當前生物資訊領域,建立大型基因調控網路需要耗費高額的計算成本,在資源和成本的限制之下,過去許多學者利用分散式運算,結合眾多個人電腦的運算能力,共同完成耗時的計算工作。近年來雲端運算技術成熟,學界和業界將雲端運算技術廣泛運用在大型資料的計算上,其中Hadoop是目前最為知名且可靠的開放原始碼雲端運算框架,支援MapReduce分散運算機制,能夠在任何虛擬機器或實體電腦組成的叢集環境下進行分散式運算工作,是一套具備高度抽象化的雲端運算框架,允許使用者開發Map Reduce程式,在任何Hadoop雲端運算環境下,進行大規模的資料分析作業,並支援完善的資料備份和回復機制。
    粒子群最佳化演算法,是Eberhart、Kennedy學者於1995年提出,是以群體為基礎的最佳化搜尋方法;GAPSO是改良型的粒子群最佳化演算法,融入基因演算法的選擇、交配、突變機制,具備更良好的最佳解搜尋、和跳脫區域最佳解的能力,學界常用來作為推導基因調控網路的參數最佳化方法。
    基因調控網路是利用節點和連結來表達基因之間的作用關係,利用實驗的方式觀察基因在時間序列的表現量,推導建立基因調控網路,是當前生物資訊領域重要探討的議題之一。其中非線性微分方程式的S-System基因網路模型,能夠描述生物網路系統,分析內部的動態變化情形,是目前最廣泛被使用的方法,其中改良型的De.S-System,能夠大幅縮減參數的維度大小,適用在大型基因調控網路推導。建立一個含有N個基因的調控網路,必須處理包含2N(N+1)個參數的非線性微分方程式組,此為一個大量參數最佳化的問題,需要耗費高額的計算成本。
    本研究提出Map Reduce-Island Based GAPSO演算法,能夠有效的在Hadoop雲端環境下執行Map Reduce分散式運算,完成De.S-System參數的最佳化,建立大型基因調控網路。在26台電腦的叢集環境推導含有125基因的大型基因調控網路,相較於單機運算,能夠減少90%的計算時間、並提升9.7倍的速度。
    摘要(英) The construction of Gene Regulatory Networks (GRNs) is one of the most important issues in systems biology. To infer a large-scale GRN with a nonlinear mathematical model, researchers need to encounter the time-consuming problem due to the large number of network parameters involved. In recent years, the cloud computing technique has been widely used to solve large-scale problems. Among others, Hadoop is currently the most well-known and reliable cloud computing framework, which allows users to analyze large amount of data in a distributed environment (i.e., MapReduce). It also supports data backup and data recovery mechanisms.
    This study proposes an Island-based GAPSO algorithm under the Hadoop cloud computing environment to infer large-scale GRNs. GAPSO exploited the position and velocity functions of PSO, and integrated the operations of Genetic Algorithm. This approach is often used to derive the optimal solution in nonlinear mathematical models. Several sets of experiments have been conducted, in which the number of network nodes varied from 50 to 125. The experiments were executed in the Hadoop distributed environment with 10, 20, and 26 computers, respectively. In the experiments of inferring the network with 125 gene nodes on the largest Hadoop cluster (i.e. 26 computers), the proposed framework performed up to 9.7 times faster than the stand-alone computer. It means that our work can successfully reduce 90% of the computation time in a single experimental run.
    關鍵字(中)
  • Hadoop
  • 雲端運算
  • 粒子群最佳化
  • 基因調控網路
  • MapReduce
  • 關鍵字(英)
  • Cloud Computing
  • Gene Regulatory Networks
  • Particle Swarm Optimization
  • Hadoop
  • MapReduce
  • 論文目次 論文摘要 i
    英文摘要 ii
    目錄 iii
    圖示目錄 v
    表格目錄 vii
    1. 緒論 1
    1.1 研究背景 1
    1.2 研究動機與目的 2
    2. 文獻探討 4
    2.1 基因調控網路推導 4
    2.1.1 以De.S-System方法推導基因調控網路 4
    2.2  基因演算法粒子群最佳化(Genetic Algorithm-Particle Swarm Optimization) 5
    2.3 雲端運算 8
    2.2.1 Hadoop雲端運算技術 8
    2.2.2 MapReduce運算機制 12
    2.2.2.1 一般型MapReduce運算 13
    2.2.2.2 迭代型MapReduce運算 14
    3. 研究方法與架構 15
    3.1 GAPSO推導基因調控網路 15
    3.2  島嶼式運算模式 19
    3.3  MR_IGAPSO (MapReduce Island-based GAPSO) 演算法 23
    3.3.1 MR_IGAPSO架構 24
    3.3.2 MR_IGAPSO運作流程 29
    3.3.3 MR_IGAPSO演算法 32
    3.4 實驗設計 36
    4. 實驗結果與討論 38
    4.1 實驗環境介紹 38
    4.2 實驗結果 39
    4.2.1 單機IGAPSO與GAPSO模擬結果 39
    4.2.2 MR_IGAPSO計算時間比較 45
    4.2.2.1 25基因 46
    4.2.2.2 50基因 45
    4.2.2.3 100基因 47
    4.2.2.4 125基因 48
    4.2.2.5 綜合比較  49
    4.2.3 Hadoop叢集大小與加速倍率比較 52
    4.2.3.1 25基因 53
    4.2.3.2 50基因 52
    4.2.3.3 100基因 53
    4.2.3.4 125基因 54
    4.2.3.5 綜合比較 54
    4.2.4 推導結果綜合比較 56
    5. 結論 60
    5.1 研究結果與討論 60
    5.2 未來研究 61
    6. 參考文獻 63
    參考文獻 1. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System. 19th ACM Symposium on Operating Systems Principles(SOSP), 2003.
    2. Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simpli_ed Data Processing on Large Clusters. Operating Systems Design and Implementation (OSDI), 2004.
    3. Hadoop, http://hadoop.apache.org/
    4. Applications powered by Hadoop: http://wiki.apache.org/hadoop/PoweredBy
    5. Nutch, http://nutch.apache.org/
    6. de Jong, H., Modeling and Simulation of Genetic Regulatory Systems: A Literature Review. Journal of Computational Biology, 2002. 9(1): p. 67-103.
    7. Sima, C., J. Hua, and S. Jung, Inference of Gene Regulatory Networks Using Time-Series Data: A Survey. Current Genomics, 2009. 10: p. 416-429.
    8. Cho, K.H., et al., Reverse engineering of gene regulatory networks. Systems Biology, IET, 2007. 1(3): p. 149-163.
    9. Hecker, M., et al., Gene regulatory network inference: Data integration in dynamic models--A review. Biosystems, 2009. 96(1): p. 86-103.
    10. Noman, N. and H. Iba, Inference of gene regulatory networks using s-system and differential evolution, in Proceedings of the 2005 conference on Genetic and evolutionary computation. 2005, ACM: Washington DC, USA. p. 439-446.
    11. Yeh, W.-C., et al., Feasible prediction in S-system models of genetic networks. Expert Systems with Applications, 2011. 38(1): p. 193-197.
    12. Nasimul, N., Inferring Gene Regulatory Networks using Differential Evolution with Local Search Heuristics. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2007. 4: p. 634-647.
    13. Schlitt, T. and A. Brazma, Current approaches to gene regulatory network modelling. BMC Bioinformatics, 2007. 8(Suppl 6): p. S9.
    14. Lee, W.-P. and W.-S. Tzou, Computational methods for discovering gene networks from expression data. Briefings in Bioinformatics, 2009. 10(4): p. 408-423.
    15. Chou, I.C. and E.O. Voit, Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Mathematical Biosciences, 2009. 219(2): p. 57-83.
    16. Savageau, M.A., Biochemical systems analysis A study of function and design in molecular biology. 1976: Addison-Wesley.
    17. Voit, E.O., Computational Analysis of Biochemical Systems. 2000: Cambridge University Press.
    18. Wang, Y., et al., Reconstruct gene regulatory network using slice pattern model. BMC Genomics, 2009. 10(Suppl 1): p. S2.
    19. Kennedy, J. and R. Eberhart. Particle swarm optimization. in Neural Networks, 1995. Proceedings., IEEE International Conference on. 1995.
    20. Eberhart and S. Yuhui. Particle swarm optimization: developments, applications and resources. in Evolutionary Computation, 2001. Proceedings of the 2001 Congress on. 2001.
    21. Kojima, K., Matsuo, H., Ishigame,M., Asynchronous Parallel Distributed GA using Elite Server, Congress on Evolutionary Computation, 2003,Vol. 4, pp. 2603-2610.
    22. Yi, W., Liu, Q. He, Y., Dynamic Distributed Genetic Algorithms, Evolutionary Computation, 2000, Vol. 2, pp. 1132-1136.
    23. Shinn-Ying Ho, Chih-Hung Hsieh, An Intelligent Two-Stage Evolutionary Algorithm for Dynamic Pathway Identification from Gene Expression Profiles, IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL.4, NO. 4, OCTOBER-DECEMBER 2007
    24. Satish Narayana Srirama, Pelle Jakovits , Eero Vainikko, Adapting scientific computing problems to clouds using MapReduce , Future Generation Computer Systems 28 (2012) 184–192. 2012
    25. Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst, The HaLoop approach to large-scale iterative data analysis, The VLDB Journal, vol. 21, no. 2, 2012, pp. 169-190
    26. Yanfeng Zhang, Qixin Gao, Qixin Gao, Cuirong Wang, iMapReduce: A Distributed Computing Framework for Iterative Computation, Journal of Grid Computing. Volume 10 Issue 1, March 2012 ,Pages 47-68
    27. G. Sudha Sadasivam,Dharini Selvaraj, A Novel Parallel Hybrid PSO-GA using MapReduce to Schedule Jobs in Hadoop Data Grids, 010 Second World Congress on Nature and Biologically Inspired Computing Dec. 15-17,2010
    28. Chao Jin,Vecchiola, C.; Buyya, R., MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms, eScience, 2008. eScience '08. IEEE Fourth International Conference. p214-221 , 2008
    29. Abhishek Verma+, Xavier Llor`a∗, David E. Goldberg# and Roy H. Campbell, Scaling Genetic Algorithms using MapReduce, Department of Computer Science∗National Center for Supercomputing Applications (NCSA)#Department of Industrial and Enterprise Systems EngineeringUniversity of Illinois at Urbana-Champaign, IL, US 61801, 2009
    30. McNabb, A.W.;Monson, C.K.; Seppi, K.D., Parallel PSO using MapReduce , Evolutionary Computation, 2007. CEC 2007. IEEE Congress on
    口試委員
  • 張德民 - 召集委員
  • 蔡玉娟 - 委員
  • 李偉柏 - 指導教授
  • 鄭炳強 - 指導教授
  • 口試日期 2012-07-20 繳交日期 2012-09-13

    [回到前頁查詢結果 | 重新搜尋]


    如有任何問題請與論文審查小組聯繫