An Unsupervised Clustering Technique of XML Documents based on Function Transform and FFT 


Vol. 14,  No. 2, pp. 169-180, Apr.  2007
10.3745/KIPSTD.2007.14.2.169


PDF
  Abstract

This paper discusses a new unsupervised XML document clustering technique based on the function transform and FFT(Fast Fourier Transform). An XML document is transformed into a discrete function based on the hierarchical nesting structure of the elements. The discrete function is, then, transformed into vectors using FFT. The vectors of twodocuments are compared using a weighted Euclideandistance metric. If the comparison is lower than the pre specified threshold, the two documents are considered similar in the structure and are grouped into the same cluster. XML clustering can be useful for the storage and searching of XML documents. The experiments wereconducted with 800 synthetic documents and also with 520 real documents. The experiments showed that the function transform and FFT are effective for the incremental and unsupervised clustering of XML documents similar in structure.

  Statistics


  Cite this article

[IEEE Style]

H. S. Lee, "An Unsupervised Clustering Technique of XML Documents based on Function Transform and FFT," The KIPS Transactions:PartD, vol. 14, no. 2, pp. 169-180, 2007. DOI: 10.3745/KIPSTD.2007.14.2.169.

[ACM Style]

Ho Suk Lee. 2007. An Unsupervised Clustering Technique of XML Documents based on Function Transform and FFT. The KIPS Transactions:PartD, 14, 2, (2007), 169-180. DOI: 10.3745/KIPSTD.2007.14.2.169.