Title page for etd-0119109-190042


[Back to Results | New Search]

URN etd-0119109-190042
Author Kuan-hsien Chen
Author's Email Address spmisg@gmail.com
Statistics This thesis had been viewed 5559 times. Download 3643 times.
Department Information Management
Year 2008
Semester 1
Degree Master
Type of Document
Language zh-TW.Big5 Chinese
Title Using Text mining Techniques for automatically classifying Public Opinion Documents
Date of Defense 2009-01-14
Page Count 102
Keyword
  • Text Categorization
  • Word Segmentation
  • Genetic Algorithms
  • Public Opinion
  • Text Mining
  • Abstract In a democratic society, the number of public opinion documents increase with days, and there is a pressing need for automatically classifying these documents. Traditional approach for classifying documents involves the techniques of segmenting words and the use of stop words, corpus, and grammar analysis for retrieving the key terms of documents. However, with the emergence of new terms, the traditional methods that leverage dictionary or thesaurus may incur lower accuracy. Therefore, this study proposes a new method that does not require the prior establishment of a dictionary or thesaurus, and is applicable to documents written in any language and documents containing unstructured text. Specifically, the classification method employs genetic algorithm for achieving this goal.
       In this method, each training document is represented by several chromosomes, and based on the gene values of these chromosomes, the characteristic terms of the document are determined. The fitness function, which is required by the genetic algorithm for evaluating the fitness of an evolved chromosome, considers the similarity to the chromosomes of documents of other types.
       This study used data FAQ of e-mail box of Taipei city mayor for evaluating the proposed method by varying the length of documents. The results show that the proposed method achieves the average accuracy rate of 89%, the average precision rate of 47%, and the average recall rate of 45%. In addition, F-measure can reach up to 0.7.
       The results confirms that the number of training documents, content of training documents, the similarity between the types of documents, and the length of the documents all contribute to the effectiveness of the proposed method.
    Advisory Committee
  • Jack Hsu - chair
  • Feng-Yang Kuo - co-chair
  • S.-Y. Hwang - advisor
  • Files
  • etd-0119109-190042.pdf
  • indicate in-campus access immediately and off_campus access in a year
    Date of Submission 2009-01-19

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have more questions or technical problems, please contact eThesys