Improving Object Detection via Lightweight Cross-Attention-based Semantic Alignment 


Vol. 14,  No. 9, pp. 668-676, Sep.  2025
https://doi.org/10.3745/TKIPS.2025.14.9.668


  Abstract

Accurate detection of objects with varying scales requires effective multi-scale feature representation learning. To this end, most modern object detectors adopt Feature Pyramid Network (FPN)-based feature fusion strategies. However, due to differences in semantic granularity and information content across feature levels, direct fusion often results in semantic misalignment, leading to increased false positives and limited detection performance. In this paper, we propose a lightweight cross-attention-based semantic alignment module that aligns adjacent feature levels prior to fusion. The module leverages semantically weak low-level features as queries and semantically rich high-level features as keys and values, enabling effective modeling of inter-level semantic relationships. To ensure computational efficiency and real-time applicability, the sequence length is constrained based on the lowest-resolution feature map. We integrate the proposed module into both conventional and real-time object detectors and evaluate it on the MS COCO and PASCAL VOC datasets. Experimental results demonstrate consistent improvements in AP and AP50 metrics, validating the effectiveness and generality of our approach.

  Statistics


  Cite this article

[IEEE Style]

H. Lee, J. Lee, W. Kang, "Improving Object Detection via Lightweight Cross-Attention-based Semantic Alignment," The Transactions of the Korea Information Processing Society, vol. 14, no. 9, pp. 668-676, 2025. DOI: https://doi.org/10.3745/TKIPS.2025.14.9.668.

[ACM Style]

Hyungseop Lee, Jiho Lee, and Woochul Kang. 2025. Improving Object Detection via Lightweight Cross-Attention-based Semantic Alignment. The Transactions of the Korea Information Processing Society, 14, 9, (2025), 668-676. DOI: https://doi.org/10.3745/TKIPS.2025.14.9.668.