• Acelerando los momentos de Zernike sobre Kepler 

      Ruiz, Antonio; Ujaldon-Martinez, Manuel (2014-05-02)
      Este trabajo analiza las características más avanzadas de la arquitectura Kepler de Nvidia, principalmente el paralelismo dinámico para el lanzamiento de kernels desde la GPU y la planificación de hilos con Hyper-Q. ...
    • CUVLE: Variable-Length Encoding on CUDA 

      Fuentes-Alventosa, Antonio; Gómez-Luna, Juan; González-Linares, José M.; Guil, Nicolás (2014-10-14)
      Data compression is the process of representing information in a compact form, in order to reduce the storage requirements and, hence, communication bandwidth. It has been one of the critical enabling technologies for ...
    • Entropy-based High Performance Computation of Boolean SNP-SNP Interactions Using GPUs 

      Riveros, Carlos; Ujaldon-Martinez, Manuel; Pablo, Moscato (2014-05-02)
      It is being increasingly accepted that traditional statistical Single Nucleotide Polymorphism (SNP) analysis of Genome-Wide Association Studies (GWAS) reveals just a small part of the heritability in complex diseases. ...
    • Floating Point Square Root under HUB Format 

      Villalba-Moreno, Julio; Hormigo-Aguilar, Javier (2017-09-26)
      Unit-Biased (HUB) is an emerging format based on shifting the representation line of the binary numbers by half unit in the last place. The HUB format is specially relevant for computers where rounding to nearest is ...
    • Generating order policies by SDP: non-stationary demand and service level constraints 

      Pauls-Worm, Karin G.J.; Hendrix, Eligius Maria Theodorus (2015-07-06)
      Inventory control implies dynamic decision making. Therefore, dynamic programming seems an appropriate approach to look for order policies. For finite horizon planning, the implementation of service level constraints ...
    • GPUs for high performance computing, Deep learning and beyond 

      Ujaldon-Martinez, Manuel (2019-10-24)
      After an impressive evolution within the last decade, Graphics Processing Units (GPUs) constitute nowadays a solid trend to accelerate scientific applications. This talk unveils the GPU architecture from an ...
    • GPUs para HPC: Logros y perspectivas futuras 

      Ujaldon-Martinez, Manuel (2013-10-18)
      Hace una década comenzábamos a mejorar las primeras aplicaciones científicas en GPUs utilizando Cg y OpenGL. Ahora CUDA y OpenCL han tomado el relevo, marcando un ritmo vertiginoso en la aceleración de códigos procedentes ...
    • Heuristics for Longest Edge Selection in Simplicial Branch and Bound 

      Herrera, Juan F.R.; Casado, Leocadio G.; Hendrix, Eligius Maria Theodorus; García, Inmaculada (2015-07-06)
      Simplicial partitions are suitable to divide a bounded area in branch and bound. In the iterative re nement process, a popular strategy is to divide simplices by their longest edge, thus avoiding needle-shaped simplices. ...
    • Improving Fixed-Point Implementation of QR Decomposition by Rounding-to-Nearest 

      Muñoz, Sergio D.; Hormigo-Aguilar, Javier (2015-06-29)
      QR decomposition is a key operation in many current communication systems. This paper shows how to reduce the area of a fixed-point QR decomposition implementation based on Givens rotations by using a new number ...
    • Improving Transactional Memory Performance for Irregular Applications 

      Pedrero, Manuel; Gutiérrez, Eladio; Romero, Sergio; Plata, Óscar (2015-06-11)
      Transactional memory (TM) offers optimistic concurrency support in modern multicore archi- tectures, helping the programmers to extract parallelism in irregular applications when data dependence information is not available ...
    • Insights into the Fallback Path of Best-Effort Hardware Transactional Memory Systems 

      Quislant, Ricardo; Gutierrez-Carrasco, Eladio Damian; Zapata, Emilio L.; Plata-Gonzalez, Oscar Guillermo (Springer International Publishing, 2016-08-24)
      Current industry proposals for Hardware Transactional Memory (HTM) focus on best-effort solutions (BE-HTM) where hardware limits are imposed on transactions. These designs may show a significant performance degradation ...
    • Patrón pipeline aplicado a arquitecturas heterogéneas big.LITTLE 

      Vilches, Antonio; Rodriguez, Andres; Navarro, Ángeles; Corbera, Francisco; Asenjo-Plaza, Rafael (2015-09-25)
      En este trabajo, proponemos una solución para permitir la ejecución de aplicaciones de tipo streaming, que constan de una serie de etapas, sobre arquitecturas heterogéneas con un multicore y una GPU integrada. Para ello, ...
    • Performance Analysis of the Multi-pass Transformation for Complex 3D-Stencils on GPUs 

      López-Zapata, Emilio; Romero, Luis F.; Tabik, Siham (2013-09-25)
      Performance Analysis of the Multi-pass Transformation for Complex 3D-Stencils on GPUs
    • Reducing overheads of dynamic scheduling on heterogeneous chips 

      Corbera, Francisco; Rodríguez, Andrés; Asenjo-Plaza, Rafael; Navarro, Ángeles; Vilches, Antonio; [et al.] (arXiv.org (Cornell University Library), 2015-01-19)
      In recent processor development, we have witnessed the integration of GPU and CPUs into a single chip. The result of this integration is a reduction of the data communication overheads. This enables an efficient collaboration ...
    • Robust tracking for augmented reality 

      González-Linares, José M.; Guil, Nicolás; Ramos Cózar, Julián (2015-06-17)
      In this paper a method for improving a tracking algorithm in an augmented reality application is presented. This method addresses several issues to this particular application, like marker-less tracking and color constancy ...
    • Simplified Floating-Point Units for High Dynamic Range Image and Video Systems 

      Hormigo-Aguilar, Javier; Villalba-Moreno, Julio (2015-06-29)
      The upcoming arrival of high dynamic range image and video applications to consumer electronics will force the utilization of floating-point numbers on them. This paper shows that introducing a slight modification on ...