FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs

dc.centroE.T.S.I. Informáticaes_ES
dc.contributor.authorLópez Albelda, Bernabé
dc.contributor.authorCastro, Francisco M.
dc.contributor.authorGonzález-Linares, José María
dc.contributor.authorGuil-Mata, Nicolás
dc.date.accessioned2025-02-21T09:38:20Z
dc.date.available2025-02-21T09:38:20Z
dc.date.created2025
dc.date.issued2022
dc.departamentoArquitectura de Computadores
dc.description.abstractNowadays, GPU clusters are available in almost every data processing center. Their GPUs are typically shared by different applications that might have different processing needs and/or different levels of priority. In this scenario, concurrent kernel execution can leverage the use of devices by co-executing kernels having a different or complementary resource utilization profile. A paramount issue in concurrent kernel execution on GPU is to obtain a suitable distribution of streaming multiproccessor (SM) resources among co-executing kernels to fulfill different scheduling aims. In this work, we present a software scheduler, named FlexSched, that employs a runtime mechanism with low overhead to perform intra-SM cooperative thread arrays (a.k.a. thread block) allocation of co-executing kernels. It also implements a productive online profiling mechanism that allows dynamically changing kernels resource assignation attending to the instant performance achieved for co-running kernels. An important characteristic of our approach is that off-line kernel analysis to establish the best resource assignment of co-located kernels is not required. Thus, it can run in any system where new applications must be immediately scheduled. Using a set of nine applications (13 kernels), we show our approach improves the co-execution performance of recent slicing methods. Moreover, our approach obtains a co-execution speedup of 1.40× while slicing method just achieves 1.29× . In addition, we test FlexSched in a real scheduling scenario where new applications are launched as soon as GPU resources become available. In this scenario, FlexSched reduces the average overall execution time by a factor of 1.25× with respect to the time obtained when proprietary hardware (HyperQ) is employed. Finally, FlexSched is also used to implement scheduling policies that guarantee maximum turnaround time for latency sensitive applications while achieving high resource use through kernel co-execution.es_ES
dc.description.sponsorshipThis work has been supported by the Junta de Andalucía of Spain (P18-FR-3130) and the Ministry of Education of Spain (PID2019-105396RB-I00). We also thank Nvidia for hardware donations within its GPU Grant Program.es_ES
dc.identifier.citationLópez-Albelda, B., Castro, F.M., González-Linares, J.M. et al. FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs. J Supercomput 78, 43–71 (2022). https://doi.org/10.1007/s11227-021-03819-zes_ES
dc.identifier.doi10.1007/s11227-021-03819-z
dc.identifier.urihttps://hdl.handle.net/10630/37987
dc.language.isoenges_ES
dc.publisherSpringer Naturees_ES
dc.rights.accessRightsopen accesses_ES
dc.subjectProgramación de ordenadoreses_ES
dc.subjectSoftwarees_ES
dc.subject.otherGPU schedulinges_ES
dc.subject.otherConcurrent kernel executiones_ES
dc.subject.otherOnline profilinges_ES
dc.subject.otherSimultaneous multikerneles_ES
dc.titleFlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUses_ES
dc.typejournal articlees_ES
dc.type.hasVersionAMes_ES
dspace.entity.typePublication
relation.isAuthorOfPublication3388700c-0831-457c-9cf8-ca14cec33a15
relation.isAuthorOfPublicationbed8ca48-652e-4212-8c3c-05bfdc85a378
relation.isAuthorOfPublication.latestForDiscovery3388700c-0831-457c-9cf8-ca14cec33a15

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
FlexSched_Final.pdf
Size:
566.7 KB
Format:
Adobe Portable Document Format
Description:

Collections