Clustering-based Model Compression Method for Deep Neural Networks 


Vol. 13,  No. 11, pp. 585-589, Nov.  2024
https://doi.org/10.3745/TKIPS.2024.13.11.585


PDF
  Abstract

On-device machine learning is becoming more popular for its strengths in cost efficiency, data privacy, and responsiveness. However, processing deep neural network models on small embedded systems is challenging due to their limited memory capacity. Previous work has proposed various model compression techniques, such as quantization and pruning. However, the techniques generally require careful fine-tuning with proper data samples to minimize accuracy loss from compression. This work proposes a new post-training model compression method that compresses the input model by clustering and pruning similar convolution kernels. The proposed method does not require data samples because it considers the similarity between kernels only. This work evaluates the proposed method with representative neural network models and demonstrates that the method can effectively reduce memory usage on average with small accuracy loss.

  Statistics


  Cite this article

[IEEE Style]

B. Chae and S. Heo, "Clustering-based Model Compression Method for Deep Neural Networks," The Transactions of the Korea Information Processing Society, vol. 13, no. 11, pp. 585-589, 2024. DOI: https://doi.org/10.3745/TKIPS.2024.13.11.585.

[ACM Style]

Byungchul Chae and Seonyeong Heo. 2024. Clustering-based Model Compression Method for Deep Neural Networks. The Transactions of the Korea Information Processing Society, 13, 11, (2024), 585-589. DOI: https://doi.org/10.3745/TKIPS.2024.13.11.585.