Mostrar el registro sencillo del ítem

dc.contributor.authorRodríguez-Moreno, Andrés 
dc.contributor.authorGonzález-Navarro, María Ángeles 
dc.contributor.authorNikov, Kris
dc.contributor.authorNunez-Yanez, José
dc.contributor.authorGran-Tejero, Rubén
dc.contributor.authorSuárez Gracia, Darío
dc.contributor.authorAsenjo-Plaza, Rafael 
dc.date.accessioned2022-05-16T09:25:45Z
dc.date.available2022-05-16T09:25:45Z
dc.date.issued2022-03
dc.identifier.citationAndrés Rodríguez, Angeles Navarro, Kris Nikov, Jose Nunez-Yanez, Rubén Gran, Darío Suárez Gracia, Rafael Asenjo, Lightweight asynchronous scheduling in heterogeneous reconfigurable systems, Journal of Systems Architecture, Volumen 124, 2022, 102398, ISSN 1383-7621, https://doi.org/10.1016/j.sysarc.2022.102398.es_ES
dc.identifier.urihttps://hdl.handle.net/10630/24124
dc.description.abstractThe trend for heterogeneous embedded systems is the integration of accelerators and general-purpose CPU cores on the same die. In these integrated architectures, like the Zynq UltraScale+ board (CPU+FPGA) that we target in this work, hardware support for shared memory and low-overhead synchronization between the accelerator and the CPU cores make the case for exploring strategies that exploit a tight collaboration between the CPUs and the accelerator. In this paper we propose a novel lightweight scheduling strategy, FastFit, targeted to FPGA accelerators, and a new scheduler based on it, named MultiFastFit, which asynchronously tackles heterogeneous systems comprised of a variety of CPU cores and FPGA IPs. Our strategy significantly reduces the overhead to automatically compute the near-optimal chunksizes when compared to a previous state-of-the-art auto-tuned approach, which makes our approach more suitable for fine-grained applications. Additionally, our scheduler MultiFastFit has been designed to enable the efficient co-execution of work among compute devices in such a way that all the devices are busy while minimizing the load unbalance. Our approaches have been evaluated using four benchmarks carefully tuned for the low-power UltraScale+ platform. Our experiments demonstrate that the FastFit strategy always finds the near-optimal FPGA chunksize for any device configuration at a reasonable cost, even for fine-grained and irregular applications, and that heterogeneous CPU+FPGA co-executions that exploit all the compute devices are usually faster and more energy efficient than the CPU-only and FPGA-only executions. We have also compared MultiFastFit with other state-of-the-art scheduling strategies, finding that it outperforms other auto-tuned approach up to 2x and it achieves similar results to manually-tuned schedulers without requiring an offline search of the ideal CPU-FPGA partition or FPGA chunk granularity.es_ES
dc.description.sponsorshipThis work was partially supported by the Spanish projects PID2019-105396RB-I00, UMA18-FEDERJA-108, and UK EPSRC projects ENEAC (EP/N002539/1), HOPWARE (EP/V040863/1) and RS MINET (INF\R2\192044). Funding for open access charge: Universidad de Málaga / CBUA.es_ES
dc.language.isoenges_ES
dc.publisherElsevieres_ES
dc.rightsinfo:eu-repo/semantics/openAccesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectComputación heterogéneaes_ES
dc.subject.otherHeterogeneous architecturees_ES
dc.subject.otherFPGAes_ES
dc.subject.otherHeterogeneous schedulinges_ES
dc.subject.otherThroughput modeles_ES
dc.subject.otherEnergy efficiencyes_ES
dc.titleLightweight asynchronous scheduling in heterogeneous reconfigurable systemses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.centroE.T.S.I. Informáticaes_ES
dc.identifier.doihttps://doi.org/10.1016/j.sysarc.2022.102398
dc.rights.ccAtribución 4.0 Internacional*


Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Atribución 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional