Title page for etd-0829112-205635


[Back to Results | New Search]

URN etd-0829112-205635
Author Yu-tien Liao
Author's Email Address No Public.
Statistics This thesis had been viewed 5351 times. Download 333 times.
Department Computer Science and Engineering
Year 2011
Semester 2
Degree Master
Type of Document
Language zh-TW.Big5 Chinese
Title The Design of Fault Tolerance of Cluster Computing Platform
Date of Defense 2012-07-20
Page Count 70
Keyword
  • SLURM
  • cluster system
  • Job duplication
  • fault-tolerance
  • distributed computing
  • Abstract If nodes got failed in a distributed application service, it will not only pay more cost to handle with these results missing, but also make scheduler cause additional loadings. For whole results don’t recalculated cause by fault occurs, it will be recalculated data of fault nodes in backup machines. Therefore, this paper uses three methods: N + N nodes, N + 1 nodes, and N + 1 nodes with probability to experiment and analyze their pros and cons, the third way gives jobs weight before assigning them, and converts weight into probability and nice value(defined by SLURM[1]) to influence scheduler’s decision of jobs’ order. When fault occurs, calculating in normal nodes’ results will back to control node, and then the fault node’s jobs are going to be reassigned or not be reassigned to backup machine for getting complete results. Finally, we will analyze these three ways good and bad.
    Advisory Committee
  • Cheng-Fu Chou - chair
  • Hsiao-Guang Wu - co-chair
  • Ying-Chih Lin - co-chair
  • Shi-Huang Chen - co-chair
  • Chun-Hung Lin - advisor
  • Files
  • etd-0829112-205635.pdf
  • Indicate in-campus at 5 year and off-campus access at 5 year.
    Date of Submission 2012-08-29

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have more questions or technical problems, please contact eThesys