A Study on the Blocking of Malicious Behavior of Generative AI Input Prompts Using Small Language Model Module 


Vol. 14,  No. 12, pp. 1044-1050, Dec.  2025
10.3745/TKIPS.2025.14.12.1044


PDF
  Abstract

Large language models (LLMs) are useful for search, coding, and agentic workflows, but because input prompts directly control their behavior, they are vulnerable to prompt injection (direct and indirect), jailbreaks, format/Unicode evasion, resource exhaustion, and misuse of tools/plugins. We propose a pre-inference prompt model that filters and blocks risks before any model call by combining a lightweight global classifier with threat-specific small language model (SLM) modules, routing by calibrated confidence and policy mapping, and removing format evasions through preprocessing such as Unicode normalization/decoding and suffix sanitization. Our evaluation on public benchmarks and real-world scenarios reports block-failure rate, over-blocking rate, latency and cost, calibration error, and domain-related metrics. We also present integration with multi-agent defenses and post-hoc moderation, along with a deployment guide grounded in least-privilege, provenance verification, and isolation principles.

  Statistics


  Cite this article

[IEEE Style]

M. J. In, R. D. Hoon, D. Yoo, "A Study on the Blocking of Malicious Behavior of Generative AI Input Prompts Using Small Language Model Module," The Transactions of the Korea Information Processing Society, vol. 14, no. 12, pp. 1044-1050, 2025. DOI: 10.3745/TKIPS.2025.14.12.1044.

[ACM Style]

Mun Jong In, Ryu Dong Hoon, and Dong-Young Yoo. 2025. A Study on the Blocking of Malicious Behavior of Generative AI Input Prompts Using Small Language Model Module. The Transactions of the Korea Information Processing Society, 14, 12, (2025), 1044-1050. DOI: 10.3745/TKIPS.2025.14.12.1044.