Automatic Classification of Web documents According to their Styles 


Vol. 11,  No. 5, pp. 555-562, Aug.  2004
10.3745/KIPSTB.2004.11.5.555


PDF
  Abstract

A genre or a style is another view of documents different from a subject or a topic. The style is also a criterion to classify the documents. There have been several studies on detecting a style of textual documents. However, only a few of them dealt with web documents. In this paper we suggest sets of features to detect styles of web documents. Web documents are different from textual documents in that they contain URL and HTML tags within the pages. We introduce the features specific to web documents, which are extracted from URL and HTML tags. Experimental results enable us to evaluate their characteristics and performances.

  Statistics


  Cite this article

[IEEE Style]

K. J. Lee, C. S. Lim, J. H. Kim, "Automatic Classification of Web documents According to their Styles," The KIPS Transactions:PartB , vol. 11, no. 5, pp. 555-562, 2004. DOI: 10.3745/KIPSTB.2004.11.5.555.

[ACM Style]

Kong Joo Lee, Chul Su Lim, and Jae Hoon Kim. 2004. Automatic Classification of Web documents According to their Styles. The KIPS Transactions:PartB , 11, 5, (2004), 555-562. DOI: 10.3745/KIPSTB.2004.11.5.555.