Performance Improvement by a Virtual Documents Technique in Text Categorization 


Vol. 11,  No. 4, pp. 501-508, Aug.  2004
10.3745/KIPSTB.2004.11.4.501


PDF
  Abstract

This paper proposes a virtual relevant document technique in the learning phase for text categorization. The method uses a simple transformation of relevant documents, i.e. making virtual documents by combining document pairs in the training set. The virtual document produced by this method has the enriched term vector space, with greater weights for the terms that co-occur in two relevant documents. The experimental results showed a significant improvement over the baseline, which proves the usefulness of the proposed method : 71% improvement on TREC-11 filtering test collection and 11% improvement on Reuters-21578 test set for the topics with less than 100 relevant documents in the micro average F1. The result analysis indicates that the addition of virtual relevant documents contributes to the steady improvement of the performance.

  Statistics


  Cite this article

[IEEE Style]

K. S. Lee and D. U. An, "Performance Improvement by a Virtual Documents Technique in Text Categorization," The KIPS Transactions:PartB , vol. 11, no. 4, pp. 501-508, 2004. DOI: 10.3745/KIPSTB.2004.11.4.501.

[ACM Style]

Kyung Soon Lee and Dong Un An. 2004. Performance Improvement by a Virtual Documents Technique in Text Categorization. The KIPS Transactions:PartB , 11, 4, (2004), 501-508. DOI: 10.3745/KIPSTB.2004.11.4.501.