HMM-based Korean Named Entity Recognition 


Vol. 10,  No. 2, pp. 229-236, Apr.  2003
10.3745/KIPSTB.2003.10.2.229


PDF
  Abstract

Named entity recognition is the process indispensable to question answering and information extraction systems. This paper presents an HMM based named entity (NE) recognition method using the construction principles of compound words. In Korean, many named entities can be decomposed into more than one word. Moreover, there are contextual relationships among nouns in an NE, and among an NE and its surrounding words. In this paper, we classify words into a word as an NE in itself, a word in an NE, and/or a word adjacent to an NE, and train an HMM based on NE-related word types and parts of speech. Proposed named entity recognition (NER) system uses trigram model of HMM for considering variable length of NEs. However, the trigram model of HMM has a serious data sparseness problem. In order to solve the problem, we use multi-level back-offs. Experimental results show that our NER system can achieve an F-measure of 87.6% in the economic articles.

  Statistics


  Cite this article

[IEEE Style]

Y. G. Hwang and B. H. Yun, "HMM-based Korean Named Entity Recognition," The KIPS Transactions:PartB , vol. 10, no. 2, pp. 229-236, 2003. DOI: 10.3745/KIPSTB.2003.10.2.229.

[ACM Style]

Yi Gzu Hwang and Bo Hyun Yun. 2003. HMM-based Korean Named Entity Recognition. The KIPS Transactions:PartB , 10, 2, (2003), 229-236. DOI: 10.3745/KIPSTB.2003.10.2.229.