Segmenting and Classifying Korean Words based on Syllables Using 1nstance-Based Learning 


Vol. 10,  No. 1, pp. 47-56, Feb.  2003
10.3745/KIPSTB.2003.10.1.47


PDF
  Abstract

Korean delimits words by white-space like English, but words in Korean is a little different in structure from those in English. Words in English generally consist of one word, but those in Korean are composed of one word and/or morpheme or more. Because of this difference, a word between white-spaces is called an Eojeol in Korean. We propose a method for segmenting and classifying Korean words and/or morphemes based on syllables using an instance-based learning. In this paper, elements of feature sets for the instance-based learning are one previous syllable, one current syllable, two next syllables, a final consonant of the current syllable, and two previous categories. Our method shows more than 97% of the F-measure of word segmentation using ETRI corpus and KAIST corpus.

  Statistics


  Cite this article

[IEEE Style]

J. H. Kim and K. J. Lee, "Segmenting and Classifying Korean Words based on Syllables Using 1nstance-Based Learning," The KIPS Transactions:PartB , vol. 10, no. 1, pp. 47-56, 2003. DOI: 10.3745/KIPSTB.2003.10.1.47.

[ACM Style]

Jae Hoon Kim and Kong Joo Lee. 2003. Segmenting and Classifying Korean Words based on Syllables Using 1nstance-Based Learning. The KIPS Transactions:PartB , 10, 1, (2003), 47-56. DOI: 10.3745/KIPSTB.2003.10.1.47.