Enhancing Table Understanding Performance through Question Variation and Table Image Merging Techniques 


Vol. 15,  No. 4, pp. 341-348, Apr.  2026
https://doi.org/10.3745/TKIPS.2026.15.4.341


PDF
  Abstract

Recent LLM-based QA systems primarily rely on text serialization to process table data; however, this approach often leads to hallucinations due to the loss of 2D structural information during the conversion process. To address this limitation, this study systematically compares the performance of Vision-Language Models (VLMs) against text-only approaches and proposes two data augmentation techniques ‘Question Variation’ and ‘Table Image Merging’ to enhance the table understanding capabilities of VLMs. Experiments conducted on 10 diverse benchmarks reveal that text-only models, even when equipped with high performance OCR (Kanana), exhibit poor performance on structurally complex tasks. In contrast, the proposed table image merging technique significantly improves structural understanding, while question variation enhances generation capabilities. The integrated model, based on Qwen2-VL, achieves performance comparable to or surpassing the table specific Table-LLaVA 7B model. This study demonstrates the necessity of VLMs for effective table understanding and presents a practical methodology for optimizing performance through efficient data augmentation.

  Statistics


  Cite this article

[IEEE Style]

M. Shin and Y. Shin, "Enhancing Table Understanding Performance through Question Variation and Table Image Merging Techniques," The Transactions of the Korea Information Processing Society, vol. 15, no. 4, pp. 341-348, 2026. DOI: https://doi.org/10.3745/TKIPS.2026.15.4.341.

[ACM Style]

Mirr Shin and Youhyun Shin. 2026. Enhancing Table Understanding Performance through Question Variation and Table Image Merging Techniques. The Transactions of the Korea Information Processing Society, 15, 4, (2026), 341-348. DOI: https://doi.org/10.3745/TKIPS.2026.15.4.341.