Title page for etd-0830115-201456


[Back to Results | New Search]

URN etd-0830115-201456
Author Hsiang-Erh Lai
Author's Email Address No Public.
Statistics This thesis had been viewed 5567 times. Download 2 times.
Department Electrical Engineering
Year 2015
Semester 1
Degree Master
Type of Document
Language English
Title Big Data Analytics Platform Establishment: Efficiency Analysis and Spam Email Filtering
Date of Defense 2015-09-11
Page Count 69
Keyword
  • TF-IDF
  • naive Bayes classifier
  • distributed computing
  • spam email filtering
  • Hadoop
  • big data
  • Abstract The era of big data has come. Except for the profit, big data brings more challenges for data analysis. Thus, the Hadoop platform is proposed to analyze big data. Hadoop platform uses MapReduce and HDFS for efficient big data analytics and storage with several commodity computers. In this thesis, we implemented the Hadoop big data platform, and delivered several efficiency analysis. In addition, a spam filtering system is also implemented on Hadoop platform. The spam filtering system is comprised of term frequency-inverse document frequency (TF-IDF) to extract the keyword features from emails and naive Bayes classifier to classifying email as spam or non-spam. In the experiments, we compared with SpamAssassin, which is robust and widely used in Linux. As experimental results show, we can detect most spam that also detected by Spamassassin, and nearly incorrectly classifies non-spam emails as spam. Most importantly, the computing performance is 10 times faster of SpamAssassin.
    Advisory Committee
  • Wu-Chih Hu - chair
  • Wen-Chuan Wu - co-chair
  • Chih-Yang Lin - co-chair
  • Chia-Hung Yeh - advisor
  • Files
  • etd-0830115-201456.pdf
  • Indicate in-campus at 5 year and off-campus access at 5 year.
    Date of Submission 2015-09-30

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have more questions or technical problems, please contact eThesys