Title page for etd-0821109-092325


[Back to Results | New Search]

URN etd-0821109-092325
Author Po-Cheng Wang
Author's Email Address No Public.
Statistics This thesis had been viewed 5337 times. Download 1322 times.
Department Computer Science and Engineering
Year 2008
Semester 2
Degree Master
Type of Document
Language English
Title Automatic Attribute Clustering and Feature Selection Based on Genetic Algorithms
Date of Defense 2009-07-16
Page Count 83
Keyword
  • k-means
  • reduct
  • genetic algorithms
  • feature clustering
  • feature selection
  • Abstract Feature selection is an important pre-processing step in mining and learning. A good set of features can not only improve the accuracy of classification, but also reduce the time to derive rules. It is executed especially when the amount of attributes in a given training data is very large. This thesis thus proposes three GA-based clustering methods for attribute clustering and feature selection. In the first method, each feasible clustering result is encoded into a chromosome with positive integers and a gene in the chromosome is for an attribute. The value of a gene represents the cluster to which the attribute belongs. The fitness of each individual is evaluated using both the average accuracy of attribute substitutions in clusters and the cluster balance. The second method further extends the first method to improve the time performance. A new fitness function based on both the accuracy and the attribute dependency is proposed. It can reduce the time of scanning the data base. The third approach uses another encoding method for representing chromosomes. It can achieve a faster convergence and a better result than the second one. At last, the experimental comparison with the k-means clustering approach and with all combinations of attributes also shows the proposed approach can get a good trade-off between accuracy and time complexity. Besides, after feature selection, the rules derived from only the selected features may usually be hard to use if some values of the selected features cannot be obtained in current environments. This problem can be easily solved in our proposed approaches. The attributes with missing values can be replaced by other attributes in the same clusters. The proposed approaches thus provide flexible alternatives for feature selection.
    Advisory Committee
  • Wen-Yang Lin - chair
  • Chung-Nan Lee - co-chair
  • Cha-Hwa Lin - co-chair
  • Tzung-Pei Hong - advisor
  • Files
  • etd-0821109-092325.pdf
  • indicate accessible in a year
    Date of Submission 2009-08-21

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have more questions or technical problems, please contact eThesys