Title page for etd-0620116-134233


[Back to Results | New Search]

URN etd-0620116-134233
Author Wei-chen Chiang
Author's Email Address No Public.
Statistics This thesis had been viewed 5352 times. Download 0 times.
Department Computer Science and Engineering
Year 2015
Semester 2
Degree Master
Type of Document
Language zh-TW.Big5 Chinese
Title Data Mining of National Health Insurance Database:
Network Design of Computing Cluster and Automatic Query Language Generator
Date of Defense 2016-07-22
Page Count 102
Keyword
  • LXC
  • virtual machine
  • SQL
  • National Health Insurance
  • Impala
  • Openflow
  • Big Data
  • Abstract Big Data technologies and applications flourish in recent years. However, the cost of solutions for big data is high because of the expensive hardware equipments and professional software tools for analysis.
    Taiwan national health insurance since 1995 accumulates the whole treatment information of all populations of Taiwan. In the past, Microsoft Excel and statistical packages, like SAS and SPSS, are the main tools to analyze these data. However, these data are too huge to be processed, or it may be handled and expense too much time.
    Therefore, the purpose of our research is looking for a solution to process the big data and its cost would not be too high. In addition to the platform solution, we also design and implement a solution for automatical SQL code generation. It is very useful for those who are not IT experts to be mining data from the platform. Our proposed platform solution is composed of a computing cluster with many off-shelf personal computers, and then we apply virtual machine tool, Linux container (LXC), to ensure data security and system scalability and utilization. Also, you use OpenFlow to ensure the required network bandwidth during the data mining.
    We choose Cloudera Impala as the tool of data mining, which uses standard SQL as the query language in order to reduce the gap between users and the database. Impala, whose implementation uses in-memory approach, has a faster query speed than those which uses Map/Reduce one. Additionally, we use HTML5 as the interface to develop the automatic SQL generator for non-IT users to quickly get correct SQL code and then to execute the code.
    Advisory Committee
  • Shihn-Sheng Wu - chair
  • Kun-Tai Lee - co-chair
  • Yen-Hsia Wen - co-chair
  • Jui-Hsiu Tsai - co-chair
  • Chun-Hung Lin - advisor
  • Files
  • etd-0620116-134233.pdf
  • Indicate in-campus at 5 year and off-campus access at 5 year.
    Date of Submission 2016-08-22

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have more questions or technical problems, please contact eThesys