eISSN : 3022-7011
ISSUER : KIPS
 
After the Korea Information Processing Society (KIPS) Transactions journal was founded in 1994, it was reorganized into the KIPS Transactions: Computer and Communication Systems(2287-5891/2734-049X ) and the KIPS Transactions: Software and Data Engi neering(2287-5905/2734-0503) in 2012. Through the KIPS official meeting on January 8th, 2024, the new KIPS Transaction journal was founded by integrating two KIPS Journals, KIPS Transactions: Computer and Communication Systems and KIPS Transactions: Software and Data Engineering. The new journal aims to realize social value and contribute to the development of South Korea’s science and technology with support from the lottery fund of the Ministry of Strategy and Finance and the science/technology promotion fund of the Ministry of Science and ICT. It is indexed in the Korea Science Academic Database, Korea Citation Index (KCI), and EBSCO.

HighlightsMore

Diffusion-based Audio-to-Visual Generation for High-Quality Bird Images

Adel Toleubekova  Joo Yong Shim  XinYu Piao  Jong-Kook Kim

Accurately identifying bird species from their vocalizations and generating corresponding bird images is still a challenging task due to limited training data and environmental noise in audio data. To address this limitation, this paper introduces a...

Bambda: A Framework for Preventing Function Invocation Condition-Based Attacks in Serverless Environments

Shin Chang Hee  Lee Seung Soo

Serverless computing is rapidly emerging as a new paradigm in cloud computing, offering automatic scalability, cost efficiency, and ease of operation. However, its two core characteristics—IAM-based privilege management and event-driven execution—ca...

PEFT Methods for Domain Adaptation

Lee You Jin  Yoon Kyung Koo  Chung Woo Dam

This study analyzed that the biggest obstacle in deploying Large Language Models (LLMs) in industrial settings is incorporating domain specificity into the models. To mitigate this issue, the study compared model performance when domain knowledge wa...

Label Differential Privacy Study for Privacy Protection in Multimodal Contrastive Learning Model

Youngseo Kim  Minseo Yu  Younghan Lee  Ho Bae

Recent advancements in multimodal deep learning have garnered significant attention from both academia and industry due to their exceptional accuracy and ability to learn rich knowledge representations. In particular, contrastive learning based appr...

Latest Publication   (Vol. 15, No. 2, Feb.  2026)

Optimizing Randomized Benchmarking in Quantum Cloud Computing: Increasing Speed and Reliability with Conditional-Based Circuit Integration
Choi Na Gyeong  Han Young Sun
Randomized benchmarking(RB), which evaluates the average error rate of quantum gates, is characterized by the need to run a large number of circuits repeatedly to obtain reliable results. However, on cloud-based quantum computing platforms, each circuit is submitted as an independent unit of work and scheduled in a first-in, first-out(FIFO) fashion, which lengthens the overall experiment time and weakens the temporal consistency of the measurement environment. To address these structural inefficiencies, this paper proposes an integrated execution approach that combines RB sequences into a single circuit and utilizes conditional branching to perform individual experiments within it. The proposed technique reduces the delay caused by queuing and provides more consistent experimental conditions by reducing the execution gap between sequences. In experiments on the IBM Quantum platform, the proposed method reduced the total execution time by approximately 71% compared to the conventional method. The proportion of pending queues decreased by about 65%. Additionally, the variability in execution time and waiting time was significantly reduced, improving the temporal stability of the experimental environment, which contributed to enhancing the reliability of measurement results by ensuring consistency in noise conditions.
The Transactions of the Korea Information Processing Society, Vol. 15, No. 2, pp. 75-84, Feb. 2026
https://doi.org/10.3745/TKIPS.2026.15.2.75
Quantum Computing Quantum Cloud Computing Randomized Benchmarking
NBS and DRL Based Dual-Stage Spectrum Allocation Technique for the IoE Network
Shim Woong Gi  Kim Sung Wook
This study proposes a novel wireless spectrum allocation method aimed at ensuring smooth sharing of limited spectrum resources, minimizing data latency, and enabling efficient data transmission. The Nash Bargaining Solution (NBS) is a mathematical approach that fairly distributes limited resources while maximizing the benefits for all negotiating parties. Additionally, reinforcement learning provides a method to efficiently allocate spectrum by learning network conditions and demand variations in dynamic environments, thereby minimizing resource waste and optimizing network performance. To address this issue, the proposed method employs a two-stage control mechanism that leverages the concepts of the Nash Bargaining Solution and the Double Deep Q-Network (DDQN) reinforcement learning algorithm. In the first stage, spectrum pricing is dynamically determined, and spectrum allocation is performed based on the Nash Bargaining Solution. In the second stage, individual IoE devices select their spectrum requests through deep reinforcement learning. Through sequential interactions among intelligent IoE devices, the proposed two-stage control approach explores synergies to optimize the spectrum allocation process. Finally, simulation results demonstrate that the jointly designed control method effectively guides individual devices to select cooperative strategies beneficial to overall system efficiency. The method outperforms existing protocols in terms of service delay, network throughput, and fairness among devices.
The Transactions of the Korea Information Processing Society, Vol. 15, No. 2, pp. 85-94, Feb. 2026
https://doi.org/10.3745/TKIPS.2026.15.2.85
Internet of Everything spectrum allocation Nash Bargaining Solution Deep Reinforcement Learning game theory
Enhancing Robustness to Errors in AI-based Speech Recognition Kiosks
Ji Soo Ryu  Soon Ki Jung
This study proposes an integrated approach to enhance robustness to errors in AI-based speech recognition kiosks. While the adoption of voice-enabled kiosks has been rapidly increasing in unmanned service environments, recognition errors caused by background noise, diverse intonations, and pronunciation variations remain a critical barrier to reliable performance. In this research, the baseline speech recognition accuracy, which stagnated at approximately 60% when using only basic Natural Language Processing (NLP) techniques, was improved to over 94% by incorporating data augmentation, utterance filtering, context-based intent analysis, and error correction algorithms. The system preprocesses speech input and applies correction strategies to classify and mitigate frequent recognition errors. Experimental results demonstrate that the proposed method consistently improves recognition accuracy across diverse user environments, achieving a level of robustness suitable for real-world kiosk deployment. This study offers a practical solution that advances the quality of AI-driven speech interfaces and contributes to enhancing the accessibility and user experience of voice-based unmanned services.
The Transactions of the Korea Information Processing Society, Vol. 15, No. 2, pp. 95-101, Feb. 2026
https://doi.org/10.3745/TKIPS.2026.15.2.95
AI-based Speech Recognition Kiosks Speech Recognition Robustness Natural Language Processing Error Correction and Utterance Filtering Intent and Entity Recognition
Exploring Gender Bias in Fashion Descriptions in Machine Translation to Korean
Yo-Han Park  Min-Seok Cho  Kong Joo Lee
Large Language Models (LLMs) have advanced translation, summarization, and QA, yet often embed social biases that threaten fairness and reliability. This study examines gender bias in machine translation with gender-neutral source languages and Korean as the target. We designed prompts focusing on clothing and colors, analyzing outputs from Google Translate and DeepL. Using Jensen–Shannon Divergence (JSD) and the Unadjusted Concordance Assessment (UCA), we evaluated how gender distributions varied with clothing types and color information. Results reveal consistent associations (e.g., dresses and skirts with female, suits with male) and strong bias effects from colors, especially “pink.” Bias patterns also differed across systems and languages, reflecting dataset and model characteristics. This work provides a framework for measuring gender bias in Korean MT and highlights fashion descriptions as a domain where stereotypes are pronounced, offering a methodology extendable to other contexts.
The Transactions of the Korea Information Processing Society, Vol. 15, No. 2, pp. 102-112, Feb. 2026
https://doi.org/10.3745/TKIPS.2026.15.2.102
machine translation prompt Gender Bias Fashion Vocabulary
Performance Analysis of Video–Audio Action Recognition Using a Cross-Attention-Based Multimodal Fusion Architecture
Jun Hwa Kim
This study analyzes the performance of various fusion strategies for audio-visual action recognition, focusing on cross-attention as the core mechanism. Visual and audio features are extracted using Swin-Transformer-based encoders, and four fusion architectures are designed by sequentially applying simple operations (concatenation, summation, multiplication), cross-attention, channel-wise gating, and self-attention. All models are evaluated on the Kinetics-Sound dataset under consistent training conditions. Experimental results show that multimodal fusion improves performance by up to 13 percentage points compared to single-modal baselines. In particular, cross-attention effectively learns semantic alignment between visual and audio modalities, contributing to improved accuracy. The final model incorporating self-attention achieves a Top-1 Accuracy of 87.20% and an F1-score of 87.02%. This study provides practical insights into the design of efficient multimodal fusion architectures that capture complex interactions between visual and auditory modalities.
The Transactions of the Korea Information Processing Society, Vol. 15, No. 2, pp. 113-120, Feb. 2026
https://doi.org/10.3745/TKIPS.2026.15.2.113
multimodal learning Action Recognition audio-visual fusion Cross attention Transformer
Automatic Table-of-Contents Generation in Scholarly Documents via Joint Layout Analysis and OCR
Sang-Baek Lee  Wonjun Choi  Jae-Wook Seol  Hye-Jin Lee
This study presents a pipeline for automatic table-of-contents (TOC) generation in scholarly documents with complex structures by combining image-based document layout analysis and OCR. A DocLayout-YOLO–based scholarly-information structuring model jointly detects ten components—sections (chapter/section/subsection), body text, tables, figures, formulas, page markers, bibliography heading, and bibliography region—and then performs region-level OCR on detected section candidates. We further apply a Section-Depth Refinement algorithm to adapt to document-specific notation conventions and improve section-level accuracy. Trained and evaluated on a dataset built from domestic science-and-technology R&D reports, the proposed system demonstrates reliable end-to-end TOC generation across diverse formats, including scanned PDFs.
The Transactions of the Korea Information Processing Society, Vol. 15, No. 2, pp. 121-129, Feb. 2026
https://doi.org/10.3745/TKIPS.2026.15.2.121
Automatic TOC generation document layout analysis DocLayout-YOLO OCR Section refinement Scholarly documents
A MICE-Doubly Robust Causal Inference Pipeline for High-Dimensional Observational Data: Analyzing Productivity Effects of Digital Transformation in Korean Manufacturing
Lee Seog Min
Causal inference in high-dimensional observational data requires addressing the dual challenges of missing data and model misspecification. We developed a pipeline that systematically integrates Multiple Imputation by Chained Equations (MICE) with the doubly robust Augmented Inverse Probability Weighting (AIPW) estimator to evaluate the productivity effects of digital transformation using data from the Korean Business Activity Survey (n=31,572, p=281, 2019–2023). By employing a MICE strategy that excludes the treatment variable, we increased the sample size by 76% (from 17,897 to 31,572), while K=5 cross-validation in AIPW estimation mitigated the risk of model misspecification. Our findings indicate that digital transformation yields an average total factor productivity (TFP) increase of 3.9%. This effect was particularly pronounced in the pre-pandemic period (+5.3%) and in the electronics (+8.6%) and chemical (+13.5%) industries. Robustness was confirmed through additional tests, including placebo tests and subsample analyses. Compared to Targeted Maximum Likelihood Estimation (TMLE), AIPW provided nearly identical estimates with a 27% faster computation time.
The Transactions of the Korea Information Processing Society, Vol. 15, No. 2, pp. 130-138, Feb. 2026
https://doi.org/10.3745/TKIPS.2026.15.2.130
Causal Inference Multiple Imputation Doubly Robust Estimation High-Dimensional Data Digital Transformation
RMSSD-Based Risk Prediction Using Wearable Sensor
Ha-Neul Kim  Young-Seob Jeong  Jeung-Im Kim
As the use of wearable devices continues to expand, research utilizing daily collected biosignals to assess autonomic nervous system mn activity, stress levels, and cardiovascular health has been actively conducted. This study proposes a model to predict the 24-hour-ahead value of RMSSD (Root Mean Square of Successive Differences), a key indicator of HRV (Heart Rate Variability), using heart rate and step count data obtained from wearable sensors. Public wearable datasets provided by the Jeju Free International City Development Center were used, and data from 95 adult participants were included in the analysis. Various statistical features were derived from heart rate and step count signals to construct the input variables, and 14 machine learning and time-series forecasting algorithms were compared. The results showed that before data augmentation, the XGBoost model achieved an RMSE of 8.69 and an MAE of 7.05, while after augmentation, the Random Forest model yielded the lowest prediction error with an RMSE of 8.65 and an MAE of 7.01. These findings demonstrate that wearable-derived biosignals can be effectively used to predict future RMSSD values and provide a foundation for developing personalized health management systems.
The Transactions of the Korea Information Processing Society, Vol. 15, No. 2, pp. 139-145, Feb. 2026
https://doi.org/10.3745/TKIPS.2026.15.2.139
Heart Rate Variability Wearable Devices Time-Series Forecasting Machine Learning Risk Prediction
Health Evaluation for Collaborative Robot Using Vision-Based Motion Data
Hui Chan Yang  M in Seo Choi  Jin Se Kim  Jung won Lee
Collaborative robots require health evaluation to prevent human injury during human–robot collaboration. However, their programmable characteristics result in diverse tasks and complex data patterns, while existing methods relying on internal sensor data face limitations due to differences in sensor types and units across devices. This study proposes a vision-based health evaluation method that overcomes task diversity and is independent of internal sensor data. The proposed method collects normal-state motion data by designing a workspace-based test program for the target collaborative robot, trains an LSTM-based prediction model using the data, and assesses robot health by measuring the similarity between actual and predicted motion signals. Applied to the Indy7 collaborative robot by Neuromeka, the proposed method achieved 87.79% accuracy in distinguishing normal and degraded states. Parameter tuning for normal range boundaries yielded up to 100% true positive and 99.5% true negative rates, confirming its capability for quantitative and user-oriented health evaluation.
The Transactions of the Korea Information Processing Society, Vol. 15, No. 2, pp. 146-159, Feb. 2026
https://doi.org/10.3745/TKIPS.2026.15.2.146
Collaborative Robot predictive maintenance Health Evaluation Vision Data
Integrated Collision Detection Framework Based on Pose, Depth, and Object Estimation Using a Single RGB Camera
Ye Chan Kim  Min Ju Hong  Hosang Yoo  Byeong Soo Kim
Ensuring worker safety in dynamic industrial environments requires collision detection systems that are fast, accurate, and cost-efficient. Conventional solutions based on LiDAR or RGB-D sensors deliver high precision but are costly, maintenance-heavy, and sensitive to environmental conditions such as poor lighting, occlusion, and reflective surfaces. To address these limitations, this study proposes a lightweight collision detection framework using only a single RGB camera. The system integrates YOLOv8 for object detection, ViTPose for human pose estimation, and Depth Anything for monocular depth estimation to reconstruct 3D positions of human joints and object centers in real time. A dedicated collision detection algorithm computes Euclidean distances between these 3D coordinates, while temporal smoothing and joint-structure-based outlier correction enhance stability and reliability. The proposed framework was validated on a custom dataset of 55 scenario videos under diverse lighting conditions, camera angles, and interaction types. Experimental results show that the system achieved an F1-score of 1.0 across all scenarios, indicating high accuracy and robustness. These findings demonstrate the potential of single RGB camera systems as practical, scalable, and cost-effective alternatives to expensive sensor-based approaches, enabling real-time collision detection for industrial safety monitoring.
The Transactions of the Korea Information Processing Society, Vol. 15, No. 2, pp. 160-168, Feb. 2026
https://doi.org/10.3745/TKIPS.2026.15.2.160
3D Spatial Analysis collision detection depth estimation Object Detection Pose Estimation Single RGB Camera
A Study on the Multimodal Fraud Transaction Detection Model Based on Financial Transactions and Signature Activities
Chan-sik Sung  Kwan-yeol Park  Tae-yang Park
The importance of detecting abnormal transactions and detecting signature forgery or alteration is increasing due to the proliferation of non-face-to-face financial transactions. Existing technology has not sufficiently reflected the dynamic characteristics of signature behavior by relying on single modal data. This study proposes a multimodal artificial intelligence model (MAIFDM) that integrates and analyzes transaction data, signature images, and handwriting behavior time series. MAIFDM combines time-space attention learning, context embedding learning, and time-series correlation mismatch learning to fuse the features of the three modules and then determines whether there is an abnormality through the Mahalanobis distance and adaptive dynamic threshold. As a result of the experiment, MAIFDM showed superior performance with F1-score 0.907 and AUC 0.942 compared to the existing model, proving that it is an effective model for multimodal data learning and fraudulent transaction detection.
The Transactions of the Korea Information Processing Society, Vol. 15, No. 2, pp. 169-179, Feb. 2026
https://doi.org/10.3745/TKIPS.2026.15.2.169
pace-time attention learning context embedding learning time series correlation mismatch learning fusion learning multimodal data learning
BioVectorField: Learning Transcriptomic Vector Fields from Irregularly Sampled scRNA-seq to Map Deterioration–Recovery Dynamics
Junku Kim  Kyuri Jo
We present BioVectorField (BioVF), a framework that learns gene-expression dynamics underlying disease progression from irregularly sampled single-cell RNA-seq time-series data. For each patient, single-cell profiles were aggregated into cell-type–specific pseudo-bulk representations at each timepoint, projected into a shared PCA space, and linked across consecutive intervals to extract temporal changes. BioVF models these changes to reconstruct a continuous transcriptional vector field and visualizes latent biological trajectories spanning the deterioration-to-recovery continuum of disease. Because the learned field encodes the directionality of temporal gene-expression changes, it enables the construction of quantitative metrics for discriminating disease phases such as deterioration and recovery. When applied to Korean COVID-19 patient data, BioVF successfully recapitulated key features of the clinical deterioration–recovery trajectory and demonstrated predictive utility. Furthermore, GSEA revealed that genes contributing to the vector-field–based predictions were associated with coherent immune activation and resolution processes in high-dimensional transcriptional space.
The Transactions of the Korea Information Processing Society, Vol. 15, No. 2, pp. 180-187, Feb. 2026
https://doi.org/10.3745/TKIPS.2026.15.2.180
scRNA-seq irregular time-series vector field regression deterioration–recovery dynamics