Spam Filter by Using X2 Statistics and Support Vector Machines 

Vol. 17,  No. 3, pp. 249-254, Jun.  2010


We propose an automatic spam filter for e-mail data using Support Vector Machines(SVM). We use a lexical form of a word and its part of speech(POS) tags as features and select features by chi square statistics. We represent each feature by TF(text frequency), TF-IDF, and binary weight for experiments. After training SVM with the selected features, SVM classifies each e-mail as spam or not. In experiment, the selected features improve the performance of our system and we acquired overall 98.9% of accuracy with TREC05-p1 spam corpus.


  Cite this article

[IEEE Style]

S. W. Lee, "Spam Filter by Using X2 Statistics and Support Vector Machines," The KIPS Transactions:PartB , vol. 17, no. 3, pp. 249-254, 2010. DOI: 10.3745/KIPSTB.2010.17.3.249.

[ACM Style]

Song Wook Lee. 2010. Spam Filter by Using X2 Statistics and Support Vector Machines. The KIPS Transactions:PartB , 17, 3, (2010), 249-254. DOI: 10.3745/KIPSTB.2010.17.3.249.