[Back to Results | New Search]

URN etd-0725103-112256 Author Yun-Tai Lu Author's Email Address No Public. Statistics This thesis had been viewed 5352 times. Download 3689 times. Department Computer Science and Engineering Year 2002 Semester 2 Degree Master Type of Document Language English Title An Efficient Hilbert Curve-based

Clustering Strategy for Large Spatial DatabasesDate of Defense 2003-06-20 Page Count 82 Keyword hierarchical clustering grid-based clustering clustering space-filling curve spatial data mining Abstract Recently, millions of databases have been used and we need a new technique that can automatically transform the processed data into useful information and knowledge. Data mining is the technique of analyzing data to discover previously unknown information and spatial data mining is the branch of data mining that deals with spatial data. In spatial data mining, clustering is one of useful techniques for discovering interesting data in the underlying data objects. The problem of clustering is that give n data points in a d-dimensional metric space, partition the data points into k clusters such that the data points within a cluster are more similar to each other than data points in different clusters. Cluster analysis has been widely applied to many areas such as medicine, social studies, bioinformatics, map regions and GIS, etc. In recent years, many researchers have focused on finding efficient methods to the clustering problem. In general, we can classify these clustering algorithms into four approaches: partition, hierarchical, density-based, and grid-based approaches. The k-means algorithm which is based on the partitioning approach is probably the most widely applied clustering method. But a major drawback of k-means algorithm is that it is difficult to determine the parameter k to represent ``natural' cluster, and it is only suitable for concave spherical clusters. The k-means algorithm has high computational complexity and is unable to handle large databases. Therefore, in this thesis, we present an efficient clustering algorithm for large spatial databases. It combines the hierarchical approach with the grid-based approach structure. We apply the grid-based approach, because it is efficient for large spatial databases. Moreover, we apply the hierarchical approach to find the genuine clusters by repeatedly combining together these blocks. Basically, we make use of the Hilbert curve to provide a way to linearly order the points of a grid. Note that the Hilbert curve is a kind of space-filling curves, where a space-filling curve is a continuous path which passes through every point in a space once to form a one-one correspondence between the coordinates of the points and the one-dimensional sequence numbers of the points on the curve. The goal of using space-filling curve is to preserve the distance that points which are close in 2-D space and represent similar data should be stored close together in the linear order. This kind of mapping also can minimize the disk access effort and provide high speed for clustering. This new algorithm requires only one input parameter and supports the user in determining an appropriate value for it. In our simulation, we have shown that our proposed clustering algorithm can have shorter execution time than other algorithms for the large databases. Since the number of data points is increased, the execution time of our algorithm is increased slowly. Moreover, our algorithm can deal with clusters with arbitrary shapes in which the k-means algorithm can not discover. Advisory Committee Tei-Wei Kuo - chair

Chien-I Lee - co-chair

San-Yi Huang - co-chair

Ye-In Chang - advisor

Files indicate accessible in a year

etd-0725103-112256.pdf Date of Submission 2003-07-25