eISSN : 3022-7011
ISSUER : KIPS
ISSUER : KIPS
After the Korea Information Processing Society (KIPS) Transactions journal was founded in 1994, it was reorganized into the KIPS Transactions: Computer and Communication Systems(2287-5891/2734-049X ) and the KIPS Transactions: Software and Data Engi neering(2287-5905/2734-0503) in 2012. Through the KIPS official meeting on January 8th, 2024, the new KIPS Transaction journal was founded by integrating two KIPS Journals, KIPS Transactions: Computer and Communication Systems and KIPS Transactions: Software and Data Engineering. The new journal aims to realize social value and contribute to the development of South Korea’s science and technology with support from the lottery fund of the Ministry of Strategy and Finance and the science/technology promotion fund of the Ministry of Science and ICT. It is indexed in the Korea Science Academic Database, Korea Citation Index (KCI), and EBSCO.
HighlightsMore
Diffusion-based Audio-to-Visual Generation for High-Quality Bird ImagesAdel Toleubekova Joo Yong Shim XinYu Piao Jong-Kook Kim |
|
| Accurately identifying bird species from their vocalizations and generating corresponding bird images is still a challenging task due to limited training data and environmental noise in audio data. To address this limitation, this paper introduces a... | |
Bambda: A Framework for Preventing Function Invocation Condition-Based Attacks in Serverless EnvironmentsShin Chang Hee Lee Seung Soo |
|
| Serverless computing is rapidly emerging as a new paradigm in cloud computing, offering automatic scalability, cost efficiency, and ease of operation. However, its two core characteristics—IAM-based privilege management and event-driven execution—ca... | |
PEFT Methods for Domain AdaptationLee You Jin Yoon Kyung Koo Chung Woo Dam |
|
| This study analyzed that the biggest obstacle in deploying Large Language Models (LLMs) in industrial settings is incorporating domain specificity into the models. To mitigate this issue, the study compared model performance when domain knowledge wa... | |
Label Differential Privacy Study for Privacy Protection in Multimodal Contrastive Learning ModelYoungseo Kim Minseo Yu Younghan Lee Ho Bae |
|
| Recent advancements in multimodal deep learning have garnered significant attention from both academia and industry due to their exceptional accuracy and ability to learn rich knowledge representations. In particular, contrastive learning based appr... | |
Latest Publication (Vol. 15, No. 2, Feb. 2026)
Optimizing Randomized Benchmarking in Quantum Cloud Computing: Increasing Speed and Reliability with Conditional-Based Circuit Integration
Choi Na Gyeong Han Young Sun
https://doi.org/10.3745/TKIPS.2026.15.2.75
Quantum Computing Quantum Cloud Computing Randomized Benchmarking
Randomized benchmarking(RB), which evaluates the average error rate of quantum gates, is characterized by the need to run a large
number of circuits repeatedly to obtain reliable results. However, on cloud-based quantum computing platforms, each circuit is submitted
as an independent unit of work and scheduled in a first-in, first-out(FIFO) fashion, which lengthens the overall experiment time and
weakens the temporal consistency of the measurement environment. To address these structural inefficiencies, this paper proposes an
integrated execution approach that combines RB sequences into a single circuit and utilizes conditional branching to perform individual
experiments within it. The proposed technique reduces the delay caused by queuing and provides more consistent experimental conditions
by reducing the execution gap between sequences. In experiments on the IBM Quantum platform, the proposed method reduced the
total execution time by approximately 71% compared to the conventional method. The proportion of pending queues decreased by about
65%. Additionally, the variability in execution time and waiting time was significantly reduced, improving the temporal stability of the
experimental environment, which contributed to enhancing the reliability of measurement results by ensuring consistency in noise
conditions.
The Transactions of the Korea Information Processing Society,
Vol. 15, No. 2, pp. 75-84,
Feb.
2026
Quantum Computing Quantum Cloud Computing Randomized Benchmarking
NBS and DRL Based Dual-Stage Spectrum Allocation Technique for the IoE Network
Shim Woong Gi Kim Sung Wook
https://doi.org/10.3745/TKIPS.2026.15.2.85
Internet of Everything spectrum allocation Nash Bargaining Solution Deep Reinforcement Learning game theory
This study proposes a novel wireless spectrum allocation method aimed at ensuring smooth sharing of limited spectrum resources,
minimizing data latency, and enabling efficient data transmission. The Nash Bargaining Solution (NBS) is a mathematical approach that
fairly distributes limited resources while maximizing the benefits for all negotiating parties. Additionally, reinforcement learning provides
a method to efficiently allocate spectrum by learning network conditions and demand variations in dynamic environments, thereby
minimizing resource waste and optimizing network performance. To address this issue, the proposed method employs a two-stage control
mechanism that leverages the concepts of the Nash Bargaining Solution and the Double Deep Q-Network (DDQN) reinforcement learning
algorithm. In the first stage, spectrum pricing is dynamically determined, and spectrum allocation is performed based on the Nash
Bargaining Solution. In the second stage, individual IoE devices select their spectrum requests through deep reinforcement learning.
Through sequential interactions among intelligent IoE devices, the proposed two-stage control approach explores synergies to optimize
the spectrum allocation process. Finally, simulation results demonstrate that the jointly designed control method effectively guides individual
devices to select cooperative strategies beneficial to overall system efficiency. The method outperforms existing protocols in terms of
service delay, network throughput, and fairness among devices.
The Transactions of the Korea Information Processing Society,
Vol. 15, No. 2, pp. 85-94,
Feb.
2026
Internet of Everything spectrum allocation Nash Bargaining Solution Deep Reinforcement Learning game theory
Enhancing Robustness to Errors in AI-based Speech Recognition Kiosks
Ji Soo Ryu Soon Ki Jung
https://doi.org/10.3745/TKIPS.2026.15.2.95
AI-based Speech Recognition Kiosks Speech Recognition Robustness Natural Language Processing Error Correction and Utterance Filtering Intent and Entity Recognition
This study proposes an integrated approach to enhance robustness to errors in AI-based speech recognition kiosks. While the adoption
of voice-enabled kiosks has been rapidly increasing in unmanned service environments, recognition errors caused by background noise,
diverse intonations, and pronunciation variations remain a critical barrier to reliable performance. In this research, the baseline speech
recognition accuracy, which stagnated at approximately 60% when using only basic Natural Language Processing (NLP) techniques, was
improved to over 94% by incorporating data augmentation, utterance filtering, context-based intent analysis, and error correction
algorithms. The system preprocesses speech input and applies correction strategies to classify and mitigate frequent recognition errors.
Experimental results demonstrate that the proposed method consistently improves recognition accuracy across diverse user environments,
achieving a level of robustness suitable for real-world kiosk deployment. This study offers a practical solution that advances the quality
of AI-driven speech interfaces and contributes to enhancing the accessibility and user experience of voice-based unmanned services.
The Transactions of the Korea Information Processing Society,
Vol. 15, No. 2, pp. 95-101,
Feb.
2026
AI-based Speech Recognition Kiosks Speech Recognition Robustness Natural Language Processing Error Correction and Utterance Filtering Intent and Entity Recognition
Exploring Gender Bias in Fashion Descriptions in Machine Translation to Korean
Yo-Han Park Min-Seok Cho Kong Joo Lee
https://doi.org/10.3745/TKIPS.2026.15.2.102
machine translation prompt Gender Bias Fashion Vocabulary
Large Language Models (LLMs) have advanced translation, summarization, and QA, yet often embed social biases that threaten fairness
and reliability. This study examines gender bias in machine translation with gender-neutral source languages and Korean as the target.
We designed prompts focusing on clothing and colors, analyzing outputs from Google Translate and DeepL. Using Jensen–Shannon
Divergence (JSD) and the Unadjusted Concordance Assessment (UCA), we evaluated how gender distributions varied with clothing types
and color information. Results reveal consistent associations (e.g., dresses and skirts with female, suits with male) and strong bias effects
from colors, especially “pink.” Bias patterns also differed across systems and languages, reflecting dataset and model characteristics. This
work provides a framework for measuring gender bias in Korean MT and highlights fashion descriptions as a domain where stereotypes
are pronounced, offering a methodology extendable to other contexts.
The Transactions of the Korea Information Processing Society,
Vol. 15, No. 2, pp. 102-112,
Feb.
2026
machine translation prompt Gender Bias Fashion Vocabulary
Performance Analysis of Video–Audio Action Recognition Using a Cross-Attention-Based Multimodal Fusion Architecture
Jun Hwa Kim
https://doi.org/10.3745/TKIPS.2026.15.2.113
multimodal learning Action Recognition audio-visual fusion Cross attention Transformer
This study analyzes the performance of various fusion strategies for audio-visual action recognition, focusing on cross-attention as
the core mechanism. Visual and audio features are extracted using Swin-Transformer-based encoders, and four fusion architectures are
designed by sequentially applying simple operations (concatenation, summation, multiplication), cross-attention, channel-wise gating, and
self-attention. All models are evaluated on the Kinetics-Sound dataset under consistent training conditions. Experimental results show
that multimodal fusion improves performance by up to 13 percentage points compared to single-modal baselines. In particular,
cross-attention effectively learns semantic alignment between visual and audio modalities, contributing to improved accuracy. The final
model incorporating self-attention achieves a Top-1 Accuracy of 87.20% and an F1-score of 87.02%. This study provides practical insights
into the design of efficient multimodal fusion architectures that capture complex interactions between visual and auditory modalities.
The Transactions of the Korea Information Processing Society,
Vol. 15, No. 2, pp. 113-120,
Feb.
2026
multimodal learning Action Recognition audio-visual fusion Cross attention Transformer
Automatic Table-of-Contents Generation in Scholarly Documents via Joint Layout Analysis and OCR
Sang-Baek Lee Wonjun Choi Jae-Wook Seol Hye-Jin Lee
https://doi.org/10.3745/TKIPS.2026.15.2.121
Automatic TOC generation document layout analysis DocLayout-YOLO OCR Section refinement Scholarly documents
This study presents a pipeline for automatic table-of-contents (TOC) generation in scholarly documents with complex structures by
combining image-based document layout analysis and OCR. A DocLayout-YOLO–based scholarly-information structuring model jointly
detects ten components—sections (chapter/section/subsection), body text, tables, figures, formulas, page markers, bibliography heading,
and bibliography region—and then performs region-level OCR on detected section candidates. We further apply a Section-Depth
Refinement algorithm to adapt to document-specific notation conventions and improve section-level accuracy. Trained and evaluated
on a dataset built from domestic science-and-technology R&D reports, the proposed system demonstrates reliable end-to-end TOC
generation across diverse formats, including scanned PDFs.
The Transactions of the Korea Information Processing Society,
Vol. 15, No. 2, pp. 121-129,
Feb.
2026
Automatic TOC generation document layout analysis DocLayout-YOLO OCR Section refinement Scholarly documents
A MICE-Doubly Robust Causal Inference Pipeline for High-Dimensional Observational Data: Analyzing Productivity Effects of Digital Transformation in Korean Manufacturing
Lee Seog Min
https://doi.org/10.3745/TKIPS.2026.15.2.130
Causal Inference Multiple Imputation Doubly Robust Estimation High-Dimensional Data Digital Transformation
Causal inference in high-dimensional observational data requires addressing the dual challenges of missing data and model
misspecification. We developed a pipeline that systematically integrates Multiple Imputation by Chained Equations (MICE) with the doubly
robust Augmented Inverse Probability Weighting (AIPW) estimator to evaluate the productivity effects of digital transformation using data
from the Korean Business Activity Survey (n=31,572, p=281, 2019–2023). By employing a MICE strategy that excludes the treatment variable,
we increased the sample size by 76% (from 17,897 to 31,572), while K=5 cross-validation in AIPW estimation mitigated the risk of model
misspecification. Our findings indicate that digital transformation yields an average total factor productivity (TFP) increase of 3.9%. This
effect was particularly pronounced in the pre-pandemic period (+5.3%) and in the electronics (+8.6%) and chemical (+13.5%) industries.
Robustness was confirmed through additional tests, including placebo tests and subsample analyses. Compared to Targeted Maximum
Likelihood Estimation (TMLE), AIPW provided nearly identical estimates with a 27% faster computation time.
The Transactions of the Korea Information Processing Society,
Vol. 15, No. 2, pp. 130-138,
Feb.
2026
Causal Inference Multiple Imputation Doubly Robust Estimation High-Dimensional Data Digital Transformation
RMSSD-Based Risk Prediction Using Wearable Sensor
Ha-Neul Kim Young-Seob Jeong Jeung-Im Kim
https://doi.org/10.3745/TKIPS.2026.15.2.139
Heart Rate Variability Wearable Devices Time-Series Forecasting Machine Learning Risk Prediction
As the use of wearable devices continues to expand, research utilizing daily collected biosignals to assess autonomic nervous system
mn activity, stress levels, and cardiovascular health has been actively conducted. This study proposes a model to predict the 24-hour-ahead
value of RMSSD (Root Mean Square of Successive Differences), a key indicator of HRV (Heart Rate Variability), using heart rate and step
count data obtained from wearable sensors. Public wearable datasets provided by the Jeju Free International City Development Center
were used, and data from 95 adult participants were included in the analysis. Various statistical features were derived from heart rate
and step count signals to construct the input variables, and 14 machine learning and time-series forecasting algorithms were compared.
The results showed that before data augmentation, the XGBoost model achieved an RMSE of 8.69 and an MAE of 7.05, while after
augmentation, the Random Forest model yielded the lowest prediction error with an RMSE of 8.65 and an MAE of 7.01. These findings
demonstrate that wearable-derived biosignals can be effectively used to predict future RMSSD values and provide a foundation for
developing personalized health management systems.
The Transactions of the Korea Information Processing Society,
Vol. 15, No. 2, pp. 139-145,
Feb.
2026
Heart Rate Variability Wearable Devices Time-Series Forecasting Machine Learning Risk Prediction
Health Evaluation for Collaborative Robot Using Vision-Based Motion Data
Hui Chan Yang M in Seo Choi Jin Se Kim Jung won Lee
https://doi.org/10.3745/TKIPS.2026.15.2.146
Collaborative Robot predictive maintenance Health Evaluation Vision Data
Collaborative robots require health evaluation to prevent human injury during human–robot collaboration. However, their programmable
characteristics result in diverse tasks and complex data patterns, while existing methods relying on internal sensor data face limitations
due to differences in sensor types and units across devices. This study proposes a vision-based health evaluation method that overcomes
task diversity and is independent of internal sensor data. The proposed method collects normal-state motion data by designing a
workspace-based test program for the target collaborative robot, trains an LSTM-based prediction model using the data, and assesses
robot health by measuring the similarity between actual and predicted motion signals. Applied to the Indy7 collaborative robot by
Neuromeka, the proposed method achieved 87.79% accuracy in distinguishing normal and degraded states. Parameter tuning for normal
range boundaries yielded up to 100% true positive and 99.5% true negative rates, confirming its capability for quantitative and user-oriented
health evaluation.
The Transactions of the Korea Information Processing Society,
Vol. 15, No. 2, pp. 146-159,
Feb.
2026
Collaborative Robot predictive maintenance Health Evaluation Vision Data
Integrated Collision Detection Framework Based on Pose, Depth, and Object Estimation Using a Single RGB Camera
Ye Chan Kim Min Ju Hong Hosang Yoo Byeong Soo Kim
https://doi.org/10.3745/TKIPS.2026.15.2.160
3D Spatial Analysis collision detection depth estimation Object Detection Pose Estimation Single RGB Camera
Ensuring worker safety in dynamic industrial environments requires collision detection systems that are fast, accurate, and cost-efficient.
Conventional solutions based on LiDAR or RGB-D sensors deliver high precision but are costly, maintenance-heavy, and sensitive to
environmental conditions such as poor lighting, occlusion, and reflective surfaces. To address these limitations, this study proposes a
lightweight collision detection framework using only a single RGB camera. The system integrates YOLOv8 for object detection, ViTPose
for human pose estimation, and Depth Anything for monocular depth estimation to reconstruct 3D positions of human joints and object
centers in real time. A dedicated collision detection algorithm computes Euclidean distances between these 3D coordinates, while temporal
smoothing and joint-structure-based outlier correction enhance stability and reliability. The proposed framework was validated on a custom
dataset of 55 scenario videos under diverse lighting conditions, camera angles, and interaction types. Experimental results show that the
system achieved an F1-score of 1.0 across all scenarios, indicating high accuracy and robustness. These findings demonstrate the potential
of single RGB camera systems as practical, scalable, and cost-effective alternatives to expensive sensor-based approaches, enabling
real-time collision detection for industrial safety monitoring.
The Transactions of the Korea Information Processing Society,
Vol. 15, No. 2, pp. 160-168,
Feb.
2026
3D Spatial Analysis collision detection depth estimation Object Detection Pose Estimation Single RGB Camera
A Study on the Multimodal Fraud Transaction Detection Model Based on Financial Transactions and Signature Activities
Chan-sik Sung Kwan-yeol Park Tae-yang Park
https://doi.org/10.3745/TKIPS.2026.15.2.169
pace-time attention learning context embedding learning time series correlation mismatch learning fusion learning multimodal data learning
The importance of detecting abnormal transactions and detecting signature forgery or alteration is increasing due to the proliferation
of non-face-to-face financial transactions. Existing technology has not sufficiently reflected the dynamic characteristics of signature
behavior by relying on single modal data. This study proposes a multimodal artificial intelligence model (MAIFDM) that integrates and
analyzes transaction data, signature images, and handwriting behavior time series. MAIFDM combines time-space attention learning, context
embedding learning, and time-series correlation mismatch learning to fuse the features of the three modules and then determines whether
there is an abnormality through the Mahalanobis distance and adaptive dynamic threshold. As a result of the experiment, MAIFDM showed
superior performance with F1-score 0.907 and AUC 0.942 compared to the existing model, proving that it is an effective model for
multimodal data learning and fraudulent transaction detection.
The Transactions of the Korea Information Processing Society,
Vol. 15, No. 2, pp. 169-179,
Feb.
2026
pace-time attention learning context embedding learning time series correlation mismatch learning fusion learning multimodal data learning
BioVectorField: Learning Transcriptomic Vector Fields from Irregularly Sampled scRNA-seq to Map Deterioration–Recovery Dynamics
Junku Kim Kyuri Jo
https://doi.org/10.3745/TKIPS.2026.15.2.180
scRNA-seq irregular time-series vector field regression deterioration–recovery dynamics
We present BioVectorField (BioVF), a framework that learns gene-expression dynamics underlying disease progression from irregularly
sampled single-cell RNA-seq time-series data. For each patient, single-cell profiles were aggregated into cell-type–specific pseudo-bulk
representations at each timepoint, projected into a shared PCA space, and linked across consecutive intervals to extract temporal changes.
BioVF models these changes to reconstruct a continuous transcriptional vector field and visualizes latent biological trajectories spanning
the deterioration-to-recovery continuum of disease. Because the learned field encodes the directionality of temporal gene-expression
changes, it enables the construction of quantitative metrics for discriminating disease phases such as deterioration and recovery. When
applied to Korean COVID-19 patient data, BioVF successfully recapitulated key features of the clinical deterioration–recovery trajectory
and demonstrated predictive utility. Furthermore, GSEA revealed that genes contributing to the vector-field–based predictions were
associated with coherent immune activation and resolution processes in high-dimensional transcriptional space.
The Transactions of the Korea Information Processing Society,
Vol. 15, No. 2, pp. 180-187,
Feb.
2026
scRNA-seq irregular time-series vector field regression deterioration–recovery dynamics

Korean