Improvements in Hardware Transactional Memory for GPU Architectures

dc.centroE.T.S.I. Informáticaes_ES
dc.contributor.authorVillegas Fernández, Alejandro
dc.contributor.authorAsenjo-Plaza, Rafael
dc.contributor.authorGonzález-Navarro, María Ángeles
dc.contributor.authorPlata-González, Óscar Guillermo
dc.date.accessioned2016-07-20T09:39:11Z
dc.date.available2016-07-20T09:39:11Z
dc.date.created2016
dc.date.issued2016-07-20
dc.departamentoArquitectura de Computadores
dc.description.abstractIn the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based programming for thread synchronization. Recent research proposes the use of TM in GPU architectures, where a high number of computing threads, organized in SIMT fashion, requires an effective synchronization method. In contrast to CPUs, GPUs offer two memory spaces: global memory and local memory. The local memory space serves as a shared scratch-pad for a subset of the computing threads, and it is used by programmers to speed-up their applications thanks to its low latency. Prior work from the authors proposed a lightweight hardware TM (HTM) support based in the local memory, modifying the SIMT execution model and adding a conflict detection mechanism. An efficient implementation of these features is key in order to provide an effective synchronization mechanism at the local memory level. After a quick description of the main features of our HTM design for GPU local memory, in this work we gather together a number of proposals designed with the aim of improving those mechanisms with high impact on performance. Firstly, the SIMT execution model is modified to increase the parallelism of the application when transactions must be serialized in order to make forward progress. Secondly, the conflict detection mechanism is optimized depending on application characteristics, such us the read/write sets, the probability of conflict between transactions and the existence of read-only transactions. As these features can be present in hardware simultaneously, it is a task of the compiler and runtime to determine which ones are more important for a given application. This work includes a discussion on the analysis to be done in order to choose the best configuration solution.es_ES
dc.description.sponsorshipUniversidad de Málaga. Campus de Excelencia Internacional Andalucía Tech.es_ES
dc.identifier.urihttp://hdl.handle.net/10630/11858
dc.language.isoenges_ES
dc.relation.eventdate6 de julio de 2016es_ES
dc.relation.eventplaceValladolid, Españaes_ES
dc.relation.eventtitle18th International Workshop on Compilers for Parallel Computing (CPC’15)es_ES
dc.rightsby-nc-nd
dc.rights.accessRightsopen accesses_ES
dc.subjectOrdenadores - Equipo de entrada y salidaes_ES
dc.subject.otherHardware Transactional Memoryes_ES
dc.subject.otherGPUes_ES
dc.titleImprovements in Hardware Transactional Memory for GPU Architectureses_ES
dc.typeconference outputes_ES
dspace.entity.typePublication
relation.isAuthorOfPublication6ea008bf-69ee-4104-a942-2033b5b07ab8
relation.isAuthorOfPublication0857b903-5728-47c9-b298-a203bf081d23
relation.isAuthorOfPublication34b85e22-88ce-4035-a53e-2bafb0c3310b
relation.isAuthorOfPublication.latestForDiscovery6ea008bf-69ee-4104-a942-2033b5b07ab8

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
paper-19.pdf
Size:
297.91 KB
Format:
Adobe Portable Document Format