Automatically Converting HTML Documents with Similar Pattern into XML Documents 


Vol. 9,  No. 3, pp. 355-364, Jun.  2002
10.3745/KIPSTD.2002.9.3.355


PDF
  Abstract

Recently, WWW (World Wide Web) has become a source of a large amount of information, and is now recognized not only as an information-sharing tool, but also as an information repository. Currently, the majority of documents on the web were created using HTML (Hypertext Markup Language). Although HTML is simple and easy to learn, its inherent lack of describing document structure makes it difficult to retrieve information effectively. One possible solution would be to convert such HTML documents into XML (eXtensible Markup Language) documents. XML is a standard markup language for exchanging data on the web. It can describe a document structure freely by defining its own DTD (Document Type Definition). This makes it possible to integrate, store, and retrieve data on the web efficiently. In this paper, we will propose a converter that automatically converts HTML documents with similar pattern into XML documents by analyzing the document structure and recognizing its path information.

  Statistics


  Cite this article

[IEEE Style]

K. Y. Oh and E. J. Hwang, "Automatically Converting HTML Documents with Similar Pattern into XML Documents," The KIPS Transactions:PartD, vol. 9, no. 3, pp. 355-364, 2002. DOI: 10.3745/KIPSTD.2002.9.3.355.

[ACM Style]

Keum Yong Oh and Een Jun Hwang. 2002. Automatically Converting HTML Documents with Similar Pattern into XML Documents. The KIPS Transactions:PartD, 9, 3, (2002), 355-364. DOI: 10.3745/KIPSTD.2002.9.3.355.