Efficient OpenCL-based concurrent tasks offloading on accelerators

Loading...
Thumbnail Image

Identifiers

Publication date

Reading date

Collaborators

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Procedia Computer Science

Metrics

Google Scholar

Share

Research Projects

Organizational Units

Journal Issue

Department/Institute

Abstract

Current heterogeneous platforms with CPUs and accelerators have the ability to launch several independent tasks simultaneously, in order to exploit concurrency among them. These tasks typically consist of data transfer commands and kernel computation commands. In this paper we develop a runtime approach to optimize the concurrency between data transfers and kernel computation commands in a multithreaded scenario where each CPU thread offloads tasks to the accelerator. It deploys a heuristic based on a temporal execution model for concurrent tasks. It is able to establish a near-optimal task execution order that significantly reduces the total execution time, including data transfers. Our approach has been evaluated employing five different benchmarks composed of dominant kernel and dominant transfer real tasks. In these experiments our heuristic achieves speedups up to 1.5x in AMD R9 and NVIDIA K20c accelerators and 1.3x in an Intel Xeon Phi (KNC) device.

Description

Bibliographic citation

Endorsement

Review

Supplemented By

Referenced by