Comparison of Significant Term Extraction Based on the Number of Selected Principal Components 


Vol. 13,  No. 3, pp. 329-336, Jun.  2006
10.3745/KIPSTB.2006.13.3.329


PDF
  Abstract

In this paper, we propose a method of significant term extraction within a document. The technique used is Principal Component Analysis(PCA) which is one of the multivariate analysis methods. PCA can sufficiently use term-term relationships within a document by term-term correlations. We use a correlation matrix instead of a covariance matrix between terms for performing PCA. We also try to find out thresholds of both the number of components to be selected and correlation coefficients between selected components and terms. The experimental results on 283 Korean newspaper articles show that the condition of the first six components with correlation coefficients of 0.4 is the best for extracting sentence based on the significant selected terms.

  Statistics


  Cite this article

[IEEE Style]

C. B. Lee, C. Y. Ock, H. R. Park, "Comparison of Significant Term Extraction Based on the Number of Selected Principal Components," The KIPS Transactions:PartB , vol. 13, no. 3, pp. 329-336, 2006. DOI: 10.3745/KIPSTB.2006.13.3.329.

[ACM Style]

Chang Beom Lee, Cheol Young Ock, and Hyuk Ro Park. 2006. Comparison of Significant Term Extraction Based on the Number of Selected Principal Components. The KIPS Transactions:PartB , 13, 3, (2006), 329-336. DOI: 10.3745/KIPSTB.2006.13.3.329.