JavaScript is disabled for your browser. Some features of this site may not work without it.

    Listar

    Todo RIUMAComunidades & ColeccionesPor fecha de publicaciónAutoresTítulosMateriasTipo de publicaciónCentrosDepartamentos/InstitutosEditoresEsta colecciónPor fecha de publicaciónAutoresTítulosMateriasTipo de publicaciónCentrosDepartamentos/InstitutosEditores

    Mi cuenta

    AccederRegistro

    Estadísticas

    Ver Estadísticas de uso

    DE INTERÉS

    Datos de investigaciónReglamento de ciencia abierta de la UMAPolítica de RIUMAPolitica de datos de investigación en RIUMAOpen Policy Finder (antes Sherpa-Romeo)Dulcinea
    Preguntas frecuentesManual de usoContacto/Sugerencias
    Ver ítem 
    •   RIUMA Principal
    • Investigación
    • Artículos
    • Ver ítem
    •   RIUMA Principal
    • Investigación
    • Artículos
    • Ver ítem

    Exploring data flow design and vectorization with oneAPI for streaming applications on CPU+GPU

    • Autor
      Campos, Cristian; Asenjo, Rafael; Navarro, Ángeles
    • Fecha
      2025-01-24
    • Editorial/Editor
      Springer Nature
    • Palabras clave
      Computación heterogénea
    • Resumen
      n recent times, oneAPI has emerged as a competitive framework to optimize streaming applications on heterogeneous CPU+GPU architectures, since it provides portability and performance thanks to the SYCL programming language and effi-cient parallel libraries as oneTBB. However, this approach opens up a wealth of implementations alternatives in this type of applications: from how to design the data flow to how to exploit data parallelism. Choosing the best alternative is not triv-ial, so in this paper we analyze them and contribute with an analytical model based on queue theory that helps in the on-line selection of the alternative that maximizes the throughput and the occupancy of the CPU and GPU compute units. We explore the design space offered by: a) different APIs to define the data flow (paral-lel_pipeline and Flow Graph from oneTBB, and SYCL events from SYCL); b) alternative kernel implementations to express data parallelism (SYCL, AVX and std::simd); and c) the mapping of the kernels into the available comput-ing resources (CPU cores and GPU). The results show that the std::simd library can be 1.54x faster, 3% more energy efficient, and requires 7.36x less programming effort than AVX, and that implementations that enable asynchronous offloading of tasks to the devices as those based on SYCL events and Flow Graph APIs outper-form the other APIs, being up to 1.10x faster and up to 1.18x more energy efficient.
    • URI
      https://hdl.handle.net/10630/37292
    • DOI
      https://dx.doi.org/https://doi.org/10.1007/s11227-024-06891-3
    • Compartir
      RefworksMendeley
    Mostrar el registro completo del ítem
    Ficheros
    s11227-024-06891-3 (1).pdf (2.764Mb)
    Colecciones
    • Artículos

    Estadísticas

    REPOSITORIO INSTITUCIONAL UNIVERSIDAD DE MÁLAGA
    REPOSITORIO INSTITUCIONAL UNIVERSIDAD DE MÁLAGA
     

     

    REPOSITORIO INSTITUCIONAL UNIVERSIDAD DE MÁLAGA
    REPOSITORIO INSTITUCIONAL UNIVERSIDAD DE MÁLAGA