Terminology Recognition System based on Machine Learning for Scientific Document Analysis 


Vol. 18,  No. 5, pp. 329-338, Oct.  2011
10.3745/KIPSTD.2011.18.5.329


PDF
  Abstract

Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.

  Statistics


  Cite this article

[IEEE Style]

Y. S. Choi, S. K. Song, H. W. Chun, C. H. Jeong, S. P. Choi, "Terminology Recognition System based on Machine Learning for Scientific Document Analysis," The KIPS Transactions:PartD, vol. 18, no. 5, pp. 329-338, 2011. DOI: 10.3745/KIPSTD.2011.18.5.329.

[ACM Style]

Yun Soo Choi, Sa Kwang Song, Hong Woo Chun, Chang Hoo Jeong, and Sung Pil Choi. 2011. Terminology Recognition System based on Machine Learning for Scientific Document Analysis. The KIPS Transactions:PartD, 18, 5, (2011), 329-338. DOI: 10.3745/KIPSTD.2011.18.5.329.