Improving the Performance of Statistical Automatic Text Categorization by using Phrasal Patterns and Keyword Sets 


Vol. 7,  No. 4, pp. 1150-1159, Apr.  2000
10.3745/KIPSTE.2000.7.4.1150


PDF
  Abstract

This paper presents an automatic text categorization model that improves the accuracy by combining statistical and knowledge-based categorization methods. In our model we apply knowledge-based method first, and then apply statistical method on the text which are not categorized by knowledge-based method. By using this combined method, we can improve the accuracy of categorization while categorize all the texts without failure. For statistical categorization, the vector model with Inverted Category Frequency (ICF) weighting is used. For knowledge-based categorization, Phrasal Patterns and Keyword Sets are introduced to represent sentence patterns, and then pattern matching is performed. Experimental results on new articles show that the accuracy of categorization can be improved by combining the tow different categorization methods.

  Statistics


  Cite this article

[IEEE Style]

J. G. Han, M. G. Park, K. J. Cho, J. T. Kim, "Improving the Performance of Statistical Automatic Text Categorization by using Phrasal Patterns and Keyword Sets," The Transactions of the Korea Information Processing Society (1994 ~ 2000), vol. 7, no. 4, pp. 1150-1159, 2000. DOI: 10.3745/KIPSTE.2000.7.4.1150.

[ACM Style]

Jung Gi Han, Min Gyu Park, Kwang Je Cho, and Jun Tae Kim. 2000. Improving the Performance of Statistical Automatic Text Categorization by using Phrasal Patterns and Keyword Sets. The Transactions of the Korea Information Processing Society (1994 ~ 2000), 7, 4, (2000), 1150-1159. DOI: 10.3745/KIPSTE.2000.7.4.1150.