Windows Malware Family Dataset Construction and Classification 


Vol. 14,  No. 9, pp. 651-661, Sep.  2025
https://doi.org/10.3745/TKIPS.2025.14.9.651


  Abstract

Malware family classification is a critical task for enhancing the efficiency of threat analysis and enabling rapid response strategies. However, accurate classification remains challenging due to behavioral similarities among different families and the ambiguous boundaries between variants. Moreover, most previous studies rely on outdated datasets, limiting their ability to reflect the latest trends in malware. To address these issues, this study constructs a new dataset of 3,357 Windows malware samples collected in 2024, with high label reliability ensured through cross-verification. Using this dataset, we applied a hybrid feature approach that combines static and dynamic features to a Random Forest model, achieving a maximum classification accuracy of 92.14%. An analysis of misclassified samples revealed that classification errors were often caused by shared API call sequences among certain malware families, leading to confusion, or by premature termination of malware execution, which hindered the collection of sufficient dynamic information. Based on these findings, we suggest the need for more sophisticated behavior-based feature extraction and improvements to the dynamic analysis environment to prevent early termination. This study is expected to make a practical contribution to enhancing the accuracy and reliability of future malware detection systems.

  Statistics


  Cite this article

[IEEE Style]

K. T. Young, D. S. Choi, E. G. Im, "Windows Malware Family Dataset Construction and Classification," The Transactions of the Korea Information Processing Society, vol. 14, no. 9, pp. 651-661, 2025. DOI: https://doi.org/10.3745/TKIPS.2025.14.9.651.

[ACM Style]

Kim Tae Young, Doo Seop Choi, and Eul Gyu Im. 2025. Windows Malware Family Dataset Construction and Classification. The Transactions of the Korea Information Processing Society, 14, 9, (2025), 651-661. DOI: https://doi.org/10.3745/TKIPS.2025.14.9.651.