Title page for etd-0411116-090441


[Back to Results | New Search]

URN etd-0411116-090441
Author Yu-rong Chen
Author's Email Address No Public.
Statistics This thesis had been viewed 5559 times. Download 346 times.
Department Information Management
Year 2015
Semester 2
Degree Master
Type of Document
Language zh-TW.Big5 Chinese
Title Effectively Aggregating Big Data for Visualization
Date of Defense 2016-05-06
Page Count 58
Keyword
  • Data Discretization
  • Exploratory Data Analysis
  • Big data
  • Data Reduction
  • Data Visualization
  • Abstract With the fast development of the internet technologies, data is easily generated and collected. Those data could be useful based on how the enterprise or individuals can derive the valuable information from it. Before doing more complex analysis, analyzers need to understand the data, preferably in a visualization way, leading to the approach of Exploratory Data Analysis(EDA). With EDA, analyzers can dig out the pattern or characteristic of data and then choose the appropriate model for further analysis. The common techniques of EDA include graphing, tabulation, and equation fitting, which could help the analyzers explore the data and identify its regularity. Unfortunately, when the volume of data is huge, traditional EDA methods may suffer from the lack of efficiency.
    Our work uses R to develop an EDA software based on its features of data exploration and rich package libraries and tries to efficiently visualize big data. By applying data reduction strategies, large volumes of data could be reduced to some meaningful data set with lower complexity and lower size. Specifically, we apply the strategy of binning for developing data reduction methods. Equal-width is the most common binning method for aggregating continuous variables. Although equal-width had high efficiency, it had poor performance for skewness data distribution. In this thesis, we compared three aggregation approaches: equal-width, equal-depth and MHist by assessing their time efficiencies and accuracies.
    Experimental results showed that both equal-depth and MHist has much higher accuracy at some price of efficiency when compared to equal-width. MHist method performs well in various data distributions but with lowest efficiency. The method equal-depth strikes a balance in that it has reasonable performance in both efficiency and accuracy.
    Advisory Committee
  • Yi-ling Lin - chair
  • Yung-Jan Cho - co-chair
  • Chien-Hsiang Lee - co-chair
  • San-Yia Hwang - advisor
  • Files
  • etd-0411116-090441.pdf
  • Indicate in-campus at 2 year and off-campus access at 2 year.
    Date of Submission 2016-05-11

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have more questions or technical problems, please contact eThesys