Web Document Classification Based on Hangeul Morpheme and Keyword Analyses 


Vol. 19,  No. 4, pp. 263-270, Aug.  2012
10.3745/KIPSTD.2012.19.4.263


PDF
  Abstract

With the current development of high speed Internet and massive database technology, the amount of web documents increases rapidly, and thus, classifying those documents automatically is getting important. In this study, we propose an effective method to extract document features based on Hangeul morpheme and keyword analyses, and to classify non-structured documents automatically by predicting subjects of those documents. To extract document features, first, we select terms using a morpheme analyzer, form the keyword set based on term frequency and subject-discriminating power, and perform the scoring for each keyword using the discriminating power. Then, we generate the classification model by utilizing the commercial software that implements the decision tree, neural network, and SVM(support vector machine). Experimental results show that the proposed feature extraction method has achieved considerable performance, i.e., average precision 0.90 and recall 0.84 in case of the decision tree, in classifying the web documents by subjects.

  Statistics


  Cite this article

[IEEE Style]

S. L. Lee, D. H. Park, W. S. Choi, H. J. Kim, "Web Document Classification Based on Hangeul Morpheme and Keyword Analyses," The KIPS Transactions:PartD, vol. 19, no. 4, pp. 263-270, 2012. DOI: 10.3745/KIPSTD.2012.19.4.263.

[ACM Style]

Seok Lyong Lee, Dan Ho Park, Won Sik Choi, and Hong Jo Kim. 2012. Web Document Classification Based on Hangeul Morpheme and Keyword Analyses. The KIPS Transactions:PartD, 19, 4, (2012), 263-270. DOI: 10.3745/KIPSTD.2012.19.4.263.