A Spam Filter System Based on Maximum Entropy Model Using Co-training with Spamminess Features and URL Features 


Vol. 15,  No. 1, pp. 61-72, Feb.  2008
10.3745/KIPSTB.2008.15.1.61


PDF
  Abstract

This paper presents a spam filter system using co-training with spamminess features and URL features based on the maximum entropy model. Spamminess features are the emphasizing patterns or abnormal patterns in spam messages used by spammers to express their intention and to avoid being filtered by the spam filter system. Since spammers use URLs to give the details and make a change to the URL format not to be filtered by the black list, normal and abnormal URLs can be key features to detect the spam messages.Co-training with spamminess features and URL features uses two different features which are independent each other in training. The filter system can learn information from them independently. Experiment results on TREC spam test collection shows that the proposed approach achieves 9.1% improvement and 6.9% improvement in accuracy compared to the base system and bogo filter system, respectively.The result analysis shows that the proposed spamminess features and URL features are helpful. And an experiment result of the co-training shows that two feature sets are useful since the number of training documents are reduced while the accuracy is closed to the batch learning.

  Statistics


  Cite this article

[IEEE Style]

M. G. Gong and K. S. Lee, "A Spam Filter System Based on Maximum Entropy Model Using Co-training with Spamminess Features and URL Features," The KIPS Transactions:PartB , vol. 15, no. 1, pp. 61-72, 2008. DOI: 10.3745/KIPSTB.2008.15.1.61.

[ACM Style]

Mi Gyoung Gong and Kyung Soon Lee. 2008. A Spam Filter System Based on Maximum Entropy Model Using Co-training with Spamminess Features and URL Features. The KIPS Transactions:PartB , 15, 1, (2008), 61-72. DOI: 10.3745/KIPSTB.2008.15.1.61.