Automatic Table-of-Contents Generation in Scholarly Documents via Joint Layout Analysis and OCR

This study presents a pipeline for automatic table-of-contents (TOC) generation in scholarly documents with complex structures by combining image-based document layout analysis and OCR. A DocLayout-YOLO–based scholarly-information structuring model jointly detects ten components—sections (chapter/section/subsection), body text, tables, figures, formulas, page markers, bibliography heading, and bibliography region—and then performs region-level OCR on detected section candidates. We further apply a Section-Depth Refinement algorithm to adapt to document-specific notation conventions and improve section-level accuracy. Trained and evaluated on a dataset built from domestic science-and-technology R&D reports, the proposed system demonstrates reliable end-to-end TOC generation across diverse formats, including scanned PDFs.

Statistics

Cite this article

[IEEE Style]

S. Lee, W. Choi, J. Seol, H. Lee, "Automatic Table-of-Contents Generation in Scholarly Documents via Joint Layout Analysis and OCR," The Transactions of the Korea Information Processing Society, vol. 15, no. 2, pp. 121-129, 2026. DOI: https://doi.org/10.3745/TKIPS.2026.15.2.121.

[ACM Style]

Sang-Baek Lee, Wonjun Choi, Jae-Wook Seol, and Hye-Jin Lee. 2026. Automatic Table-of-Contents Generation in Scholarly Documents via Joint Layout Analysis and OCR. The Transactions of the Korea Information Processing Society, 15, 2, (2026), 121-129. DOI: https://doi.org/10.3745/TKIPS.2026.15.2.121.

Automatic Table-of-Contents Generation in Scholarly Documents via Joint Layout Analysis and OCR

Submenu

Forms

Search
(IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

Advanced Search

Recent Publications
(LAST 3 YEARS)

Old Journals

Indexing

Related Journals

Automatic Table-of-Contents Generation in Scholarly Documents via Joint Layout Analysis and OCR

Submenu

Forms

Search (IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

Advanced Search

POPULAR KEYWORDS(TOP 10 KEYWORDS)

Recent Publications(LAST 3 YEARS)

Old Journals

Indexing

Related Journals

Search
(IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

POPULAR KEYWORDS
(TOP 10 KEYWORDS)

Recent Publications
(LAST 3 YEARS)