Efficient Scheduling for Inference of Complex Neural Network Models on Multi-GPU Systems 


Vol. 13,  No. 11, pp. 604-618, Nov.  2024
https://doi.org/10.3745/TKIPS.2024.13.11.604


PDF
  Abstract

The main challenge facing recent complex neural network models, which have shown competitive accuracy, is their efficient deployment in multi-GPU systems. The complex inter-layer dependences of the neural network models combined with the variable data communication overhead of multi-GPU systems make it almost impossible to achieve a fair performance gain under manual scheduling. To address this problem, we propose a new layer-scheduling approach called NN Maestro, which generates an efficient parallel execution strategy for multi-GPU systems that minimizes the data communication overhead, thereby improving the inference latency for complex neural network models. NN Maestro evaluates the advantages of multi-GPU scheduling using a pre-trained SVM classifier and calculates the scheduling order of layers based on Topological Sort and Significance Cost. Then, NN Maestro selects the optimal GPU by comparing Placement Costs and generates the final scheduling result by grouping layers for parallel execution. On various multi-GPU configurations (2 2080Ti, 4 V100, and 4 A6000 GPUs), NN Maestro achieves up to 1.67x of performance improvement over the baseline.

  Statistics


  Cite this article

[IEEE Style]

S. Jung, S. Lee, B. W. Kang, Y. Park, "Efficient Scheduling for Inference of Complex Neural Network Models on Multi-GPU Systems," The Transactions of the Korea Information Processing Society, vol. 13, no. 11, pp. 604-618, 2024. DOI: https://doi.org/10.3745/TKIPS.2024.13.11.604.

[ACM Style]

Sunwook Jung, Seongju Lee, Beom Woo Kang, and Yongjun Park. 2024. Efficient Scheduling for Inference of Complex Neural Network Models on Multi-GPU Systems. The Transactions of the Korea Information Processing Society, 13, 11, (2024), 604-618. DOI: https://doi.org/10.3745/TKIPS.2024.13.11.604.