Efficient Scheduling for Inference of Complex Neural Network Models on Multi-GPU Systems

Sunwook Jung; Seongju Lee; Beom Woo Kang; Yongjun Park

Efficient Scheduling for Inference of Complex Neural Network Models on Multi-GPU Systems

Vol. 13, No. 11, pp. 604-618, Nov. 2024

https://doi.org/10.3745/TKIPS.2024.13.11.604

Layer-wise Parallelism

Graph Scheduler

Support Vector Machine

Multi-GPU Data Communication

PDF

Abstract

The main challenge facing recent complex neural network models, which have shown competitive accuracy, is their efficient deployment in multi-GPU systems. The complex inter-layer dependences of the neural network models combined with the variable data communication overhead of multi-GPU systems make it almost impossible to achieve a fair performance gain under manual scheduling. To address this problem, we propose a new layer-scheduling approach called NN Maestro, which generates an efficient parallel execution strategy for multi-GPU systems that minimizes the data communication overhead, thereby improving the inference latency for complex neural network models. NN Maestro evaluates the advantages of multi-GPU scheduling using a pre-trained SVM classifier and calculates the scheduling order of layers based on Topological Sort and Significance Cost. Then, NN Maestro selects the optimal GPU by comparing Placement Costs and generates the final scheduling result by grouping layers for parallel execution. On various multi-GPU configurations (2 2080Ti, 4 V100, and 4 A6000 GPUs), NN Maestro achieves up to 1.67x of performance improvement over the baseline.

Statistics

Cite this article

[IEEE Style]

S. Jung, S. Lee, B. W. Kang, Y. Park, "Efficient Scheduling for Inference of Complex Neural Network Models on Multi-GPU Systems," The Transactions of the Korea Information Processing Society, vol. 13, no. 11, pp. 604-618, 2024. DOI: https://doi.org/10.3745/TKIPS.2024.13.11.604.

[ACM Style]

Sunwook Jung, Seongju Lee, Beom Woo Kang, and Yongjun Park. 2024. Efficient Scheduling for Inference of Complex Neural Network Models on Multi-GPU Systems. The Transactions of the Korea Information Processing Society, 13, 11, (2024), 604-618. DOI: https://doi.org/10.3745/TKIPS.2024.13.11.604.

Efficient Scheduling for Inference of Complex Neural Network Models on Multi-GPU Systems

Submenu

Forms

Search
(IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

Advanced Search

Recent Publications
(LAST 3 YEARS)

Old Journals

Indexing

Related Journals

Efficient Scheduling for Inference of Complex Neural Network Models on Multi-GPU Systems

Submenu

Forms

Search (IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

Advanced Search

POPULAR KEYWORDS(TOP 10 KEYWORDS)

Recent Publications(LAST 3 YEARS)

Old Journals

Indexing

Related Journals

Search
(IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

POPULAR KEYWORDS
(TOP 10 KEYWORDS)

Recent Publications
(LAST 3 YEARS)