RT Journal Article
T1 CAVLCU: an efcient GPU‑based implementation of CAVLC
A1 Fuentes-Alventosa, Antonio
A1 Gómez-Luna, Juan
A1 González-Linares, José María
A1 Guil-Mata, Nicolás
A1 Medina-Carnicer, Rafael
K1 Compresión de datos (Informática)
K1 Imágenes - Compresión
K1 Procesado de imágenes - Técnicas digitales
K1 Compresión de vídeo
AB In this paper, we present CAVLCU, an efficient implementation of CAVLC on GPU, which is based on four key ideas. First, we use only one kernel to avoid the long latency global memory accesses required to transmit intermediate results among different kernels, and the costly launches and terminations of additional kernels. Second, we apply an efficient synchronization mechanism for thread-blocks (In this paper, to prevent confusion, a block of pixels of a frame will be referred to as simply block and a GPU thread block as thread-block.) that process adjacent frame regions (in horizontal and vertical dimensions) to share results in global memory space. Third, we exploit fully the available global memory bandwidth by using vectorized loads to move directly the quantized transform coefficients to registers. Fourth, we use register tiling to implement the zigzag sorting, thus obtaining high instruction-level parallelism. An exhaustive experimental evaluation showed that our approach is between 2.5× and 5.4× faster than the only state-of-the-art GPUbased implementation of CAVLC.
PB Springer Nature
YR 2022
FD 2022
LK https://hdl.handle.net/10630/37825
UL https://hdl.handle.net/10630/37825
LA eng
NO Fuentes-Alventosa, A., Gómez-Luna, J., González-Linares, J.M. et al. CAVLCU: an efficient GPU-based implementation of CAVLC. J Supercomput 78, 7556–7590 (2022). https://doi.org/10.1007/s11227-021-04183-8
NO Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.
DS RIUMA. Repositorio Institucional de la Universidad de Málaga
RD 4 may 2026