CAVLCU: an efcient GPU‑based implementation of CAVLC
Loading...
Files
Description: Artículo principal
Identifiers
Publication date
Reading date
Collaborators
Advisors
Tutors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Nature
Share
Center
Department/Institute
Abstract
In this paper, we present CAVLCU, an efficient implementation of CAVLC on GPU, which is based on four key ideas. First, we use only one kernel to avoid the long latency global memory accesses required to transmit intermediate results among different kernels, and the costly launches and terminations of additional kernels. Second, we apply an efficient synchronization mechanism for thread-blocks (In this paper, to prevent confusion, a block of pixels of a frame will be referred to as simply block and a GPU thread block as thread-block.) that process adjacent frame regions (in horizontal and vertical dimensions) to share results in global memory space. Third, we exploit fully the available global memory bandwidth by using vectorized loads to move directly the quantized transform coefficients to registers. Fourth, we use register tiling to implement the zigzag sorting, thus obtaining high instruction-level parallelism. An exhaustive experimental evaluation showed that our approach is between 2.5× and 5.4× faster than the only state-of-the-art GPUbased implementation of CAVLC.
Description
Bibliographic citation
Fuentes-Alventosa, A., Gómez-Luna, J., González-Linares, J.M. et al. CAVLCU: an efficient GPU-based implementation of CAVLC. J Supercomput 78, 7556–7590 (2022). https://doi.org/10.1007/s11227-021-04183-8
Collections
Endorsement
Review
Supplemented By
Referenced by
Creative Commons license
Except where otherwised noted, this item's license is described as Atribución 4.0 Internacional










