Determining the number of Clusters in On-Line Document Clustering Algorithm 


Vol. 14,  No. 7, pp. 513-522, Dec.  2007
10.3745/KIPSTB.2007.14.7.513


PDF
  Abstract

Clustering is to divide given data and automatically find out the hidden meanings in the data. It analyzes data, which are difficult for people to check in detail, and then, makes several clusters consisting of data with similar characteristics. On-Line Document Clustering System, which makes a group of similar documents by use of results of the search engine, is aimed to increase the convenience of information retrieval area. Document clustering is automatically done without human interference, and the number of clusters, which affect the result of clustering, should be decided automatically too. Also, the one of the characteristics of an on-line system is guarantying fast response time. This paper proposed a method of determining the number of clusters automatically by geometrical information. The proposed method composed of two stages. In the first stage, centers of clusters are projected on the low-dimensional plane, and in the second stage, clusters are combined by use of distance of centers of clusters in the low-dimensional plane. As a result of experimenting this method with real data, it was found that clustering performance became better and the response time is suitable to on-line circumstance.

  Statistics


  Cite this article

[IEEE Style]

T. C. Jee, H. J. Lee, Y. B. Lee, "Determining the number of Clusters in On-Line Document Clustering Algorithm," The KIPS Transactions:PartB , vol. 14, no. 7, pp. 513-522, 2007. DOI: 10.3745/KIPSTB.2007.14.7.513.

[ACM Style]

Tae Chang Jee, Hyun Jin Lee, and Yill Byung Lee. 2007. Determining the number of Clusters in On-Line Document Clustering Algorithm. The KIPS Transactions:PartB , 14, 7, (2007), 513-522. DOI: 10.3745/KIPSTB.2007.14.7.513.