Design and Implementation of OCR Correction Model for Numeric Digits based on a Context Sensitive and Multiple Streams 


Vol. 18,  No. 1, pp. 67-80, Feb.  2011
10.3745/KIPSTD.2011.18.1.67


PDF
  Abstract

On an automated business document processing system maintaining financial data, errors on query based retrieval of numbers are critical to overall performance and usability of the system. Automatic spelling correction methods have been emerged and have played important role in development of information retrieval system. However scope of the methods was limited to the symbols, for example alphabetic letter strings, which can be reserved in the form of trainable templates or custom dictionary. On the other hand, numbers, a sequence of digits, are not the objects that can be reserved into a dictionary but a pure markov sequence. In this paper we proposed a new OCR model for spelling correction for numbers using the multiple streams and the context based correction on top of probabilistic information retrieval framework. We implemented the proposed error correction model as a sub-module and integrated into an existing automated invoice document processing system. We also presented the comparative test results that indicated significant enhancement of overall precision of the system by our model.

  Statistics


  Cite this article

[IEEE Style]

H. K. Shin, "Design and Implementation of OCR Correction Model for Numeric Digits based on a Context Sensitive and Multiple Streams," The KIPS Transactions:PartD, vol. 18, no. 1, pp. 67-80, 2011. DOI: 10.3745/KIPSTD.2011.18.1.67.

[ACM Style]

Hyun Kyung Shin. 2011. Design and Implementation of OCR Correction Model for Numeric Digits based on a Context Sensitive and Multiple Streams. The KIPS Transactions:PartD, 18, 1, (2011), 67-80. DOI: 10.3745/KIPSTD.2011.18.1.67.